3  Debugging

Pipelines are computationally demanding, complicated, and time-consuming to run. The scale can make it hard to troubleshoot when things go wrong. In addition, a pipeline is automated in a special non-interactive R process, so special steps are needed for interactive debugging with debug() and browser(). This chapter describes solutions to these challenges in terms of both best practices and features in targets.

3.1 Error messages

The metadata in _targets/meta/meta contains error messages and warning messages from when each target last ran. tar_meta() can retrieve these clues. Sometimes they are enough to fix the problems.

tar_meta(fields = error, complete_only = TRUE)
tar_meta(fields = warnings, complete_only = TRUE)

3.2 Isolate the problem

When you identify an error, i.e. from an error message, the solution is not always obvious, especially in a large complicated pipeline. Before even beginning to troubleshoot, it is best to reproduce the error in a smaller, more manageable version of the project that runs quickly and with few targets. In fact, it is best to reproduce it with fast-running downsized code that does not use a targets pipeline at all. That way, your can run it in an interactive R session and use traditional tools like debug() to go to the error and figure out why it is happening. If it is not possible to reproduce the error outside the pipeline, you could create a smaller version of the pipeline that is faster to run and easier test. Just for testing purposes, you could use fewer targets, fewer data records, or fewer iterations of a data analysis algorithm. By temporarily downsizing the pipeline while still reproducing the error, you make it easier to take advantage of the built-in debugging support described in the rest of this chapter.

3.3 The dependency graph

tar_visnetwork() checks the dependency graph of the pipeline. When you call tar_visentwork(), you should see a natural left-to-right flow of work and dependency relationships.

tar_visnetwork()

If there are disconnected or missing nodes or edges, you may need to correct a spelling mistake in a target or function.

3.4 The manifest

tar_manifest() shows the names, commands, and other settings of the targets in the pipeline.

tar_manifest()
#> # A tibble: 2 × 2
#>   name     command                     
#>   <chr>    <chr>                       
#> 1 data     get_data()                  
#> 2 analysis your_analysis_function(data)

With the manifest, can check that the targets in the pipeline have the correct R commands.

# Interactive R console:
> data <- get_data()
> your_analysis_function(data)

This is especially helpful for static branching with tar_map(), which tries to generate the commands automatically and may not succeed on your first attempt.

3.5 Environment browser

The environment browser lets you pause R and troubleshoot interactively in the middle of a running pipeline. To prepare, insert a debug() or browser() statement in the code where you want to start the interactive session. You could either put debug() in _targets.R:

# _targets.R file:
library(targets)

# Defines your_analysis_function():
source("R/script_with_your_analysis_function.R")

# Tells R to pause next time you run your_analysis_function():
debug(your_analysis_function)

# Then write your target list as usual:
list(
  tar_target(data, get_data()),
  # your_analysis_function() is called in a target:
  tar_target(analysis, your_analysis_function(data))
)

Or alternatively, insert a browser() inside the function itself:

your_analysis_function <- function(data) {
  browser() # Pause R here.
  # The rest of the function goes here:
  # ...
}

Then, restart your R session and call tar_make() with callr_function = NULL:

rstudioapi::restartSession()

#> Restarting R session...

tar_make(callr_function = NULL)
#> ✔ skip target data
#> • start target analysis
#> Called from: your_analysis_function(data)
#> Browse[1]> 

At Browse[1]>, you have an interactive R console inside a of running instance of your_analysis_function(). Use R to check the function arguments or step through the code.

#> Browse[1]> str(data)
#> 'data.frame':    100 obs. of  2 variables:
#>  $ x: int  73 24 15 4 86 64 39 35 83 58 ...
#>  $ y: int  78 93 31 67 40 34 59 82 52 38 ...

3.6 The debug option

targets has a more convenient way to launch the environment browser from inside a target:

  1. In the target script file (default: _targets.R) write a call to tar_option_set() with debug equal to the target name.
  2. Launch a fresh clean new interactive R session with the target script file (default: _targets.R) script in your working directory.
  3. Run targets::tar_make() (or targets::tar_make_clustermq(), or targets::tar_make_future()) with callr_function = NULL. If you are using targets version 0.5.0.9000 or above, consider also setting shortcut to TRUE and supplying the target name to names.1 This allows tar_make() to reach the desired target more quickly.
  4. When targets reaches the target you selected to debug, your R session will start an interactive debugger, and you should see Browse[1]> in your console. Run targets::tar_name() to verify that you are debugging the correct target.
  5. Interactively run any R code that helps you troubleshoot the problem. For example, if the target invokes a function f(), enter debug(f) and then c to immediately enter the function’s calling environment where all its arguments are defined.

To try it out yourself, write the following target script file file.

# _targets.R file
library(targets)
tar_option_set(debug = "b")
f <- function(x, another_arg = 123) x + another_arg
list(
  tar_target(a, 1),
  tar_target(b, f(a))
)

Then, call tar_make(callr_function = NULL) to drop into a debugger at the command of b.

# R console
tar_make(callr_function = NULL, names = any_of("b"), shortcut = TRUE)
#> ● run target b
#> Called from: eval(expr, envir)
Browse[1]>

When the debugger launches, run targets::tar_name() to confirm you are running the correct target.

Browse[1]> targets::tar_name()
#> [1] "b"

In the debugger, the dependency targets of b are available in the current environment, and the global objects and functions are available in the parent environment.

Browse[1]> ls()
#> [1] "a"
Browse[1]> a
#> [1] 1
Browse[1]> ls(parent.env(environment()))
#> [1] "f"
Browse[1]> f(1)
#> [1] 124

Enter debug(f) to debug the function f(), and press c to enter the function’s calling environment where another_arg is defined.

Browse[1]> debug(f)
Browse[1]> c
#> debugging in: f(a)
#> debug at _targets.R#3: x + another_arg
Browse[2]> ls()
#> [1] "another_arg" "x"   
Browse[2]> another_arg
#> [1] 123

3.7 Workspaces

Workspaces are a persistent alternative to the environment browser. A workspace is a special lightweight reference file that lists the elements of a target’s runtime environment. Using tar_workspace(), you can recover a target’s workspace and locally debug it even if the pipeline is not running. If you tell targets to record workspaces in advance, you can preempt errors and debug later at your convenience. To enable workspaces, use the workspace_on_error and workspaces arguments of tar_option_set(). These arguments set the conditions under which workspace files are saved. For example, tar_option_set(workspace_on_error = TRUE, workspaces = c("x", "y")) tells tar_make() and friends to save a workspace for a target named x, a target named y, and every target that throws and error. Example in a pipeline:

# _targets.R file:
options(tidyverse.quiet = TRUE)
library(targets)
options(crayon.enabled = FALSE)
tar_option_set(workspace_on_error = TRUE, packages = "tidyverse")
f <- function(arg, value, ...) {
  stopifnot(arg < 4)
}
list(
  tar_target(x, seq_len(4)),
  tar_target(
    y,
    f(arg = x, value = "succeeded", a = 1, b = 2, key = "my_api_key"),
    pattern = map(x) # The branching chapter describes patterns.
  )
)
# R console:
tar_make()
#> ● run target x
#> ● run branch y_29239c8a
#> ● run branch y_7cc32924
#> ● run branch y_bd602d50
#> ● run branch y_05f206d7
#> x error branch y_05f206d7
#> ● save workspace y_05f206d7
#> Error : x < 4 is not TRUE .
#> Error: callr subprocess failed: x < 4 is not TRUE .

One of the y_******* targets errored out.

failed <- tar_meta(fields = error) %>%
  na.omit() %>%
  pull(name)

print(failed)
#> [1] "y_05f206d7"

tar_workspace() reads the special metadata in the workspace file and then loads the target’s dependencies from various locations in _targets/objects and/or the cloud. It also sets the random number generator seed to the seed of the target, loads the required packages, and runs the target script file (default: _targets.R) to load other global object dependencies such as functions.

tar_workspace(y_05f206d7)

We now have the dependencies of y_05f206d7 in memory, which allows you to try out any failed function calls in your local R session. 2 3

print(x)
#> [1] 4
f(arg = 0, value = "my_value", a = 1, b = 2, key = "my_api_key")
#> [1] "my_value"
f(arg = x, value = "my_value", a = 1, b = 2, key = "my_api_key")
#> Error in f(x) : x < 4 is not TRUE

Keep in mind that that although the dependencies of y_05f206d7 are in memory, the arguments of f() are not.

arg
#> Error: object 'arg' not found
value
#> Error: object 'value' not found

The workspace also has a useful traceback, and you can retrieve it with tar_traceback(). The last couple lines of the traceback are unavoidably cryptic, but they do sometimes contain useful information.

tar_traceback(y_05f206d7, characters = 77)
#> [1] "f(arg = x, value = \"succeeded\", a = 1, b = 2, key = \"my_api_key\")"           
#> [2] "stopifnot(arg < 4)"             
#> [3] "stop(simpleError(msg, call = if (p <- sys.parent(1)) sys.call(p)))"              
#> [4] "(function (condition) \n{\n    state$error <- build_message(condition)\n    stat"

3.8 Tradeoffs

For small to medium-sized workloads, the environment browser and the debug option are usually the best choices. These techniques immediately direct control to prewritten function calls and get you as close to the error as possible. However, this may not always be feasible in large distributed workloads, e.g. tar_make_clustermq(), where most of your targets are not even running on the same computer as your main R process. For those complicated situations where it is not possible to access the R interpreter, workspaces are ideal because they store a persistent reproducible runtime state that you can recover locally.


  1. In the case of dynamic branching, names does not accept individual branch names, but you can still supply the name of the overarching dynamic target.↩︎

  2. In addition, current random number generator seed (.Random.seed) is also the value y_05f206d7 started with.↩︎

  3. When you are finished debugging, you can remove all workspace files with tar_destroy(destroy = "workspaces").↩︎