Chapter 18 Debugging and testing drake projects
This chapter aims to help users detect and diagnose problems with large complex workflows.
18.1 Debugging failed targets
18.1.1 Diagnosing errors
When a target fails, drake
tries to tell you.
<- function() {
large_dataset data.frame(x = rnorm(1e6), y = rnorm(1e6))
}
<- function(data) {
expensive_analysis # More operations go here.
tricky_operation(data)
}
<- function(data) {
tricky_operation # Expensive code here.
stop("there is a bug somewhere.")
}
<- drake_plan(
plan data = large_dataset(),
analysis = expensive_analysis(data)
)
make(plan)
diagnose()
recovers the metadata on targets. For failed targets, this includes an error object.
<- diagnose(analysis)$error
error
error
names(error)
Using the call stack, you can trace back the location of the error. Once you know roughly where to find the bug, you can troubleshoot interactively.
invisible(lapply(tail(error$calls, 3), print))
18.1.2 Interactive debugging
The clues from diagnose()
help us go back and inspect the failing code. debug()
is an interactive debugging tool which helps you verify exactly what is going wrong. Below, make(plan)
pauses execution and turn interactive control over to you inside tricky_operation()
.
debug(tricky_operation)
make(plan) # Pauses at tricky_operation(data).
undebug(tricky_operation) # Undoes debug().
drake
’s own drake_debug()
function is nearly equivalent.
drake_debug(analysis, plan) # Pauses at the command expensive_analysis(data).
browser()
is similar, but it affords you finer control over to pause execution
<- function(data) {
tricky_operation # Expensive code here.
browser() # Pauses right here to give you control.
stop("there is a bug somewhere.")
}
make(plan)
18.1.3 Efficient trial and error
If you are using drake
, then chances are your targets are computationally expensive and the long runtimes make debugging difficult. To speed up trial and error, run the plan on a small dataset when you debug and repair things.
<- drake_plan(
plan data = head(large_dataset()), # Just work with the first few rows.
analysis = expensive_analysis(data) # Runs faster now.
)
<- ... # Try to fix the function.
tricky_operation
debug(tricky_operation) # Set up to debug interactively.
make(plan) # Try to run the workflow.
After a lot of quick trial and error, we finally fix the function and run it on the small data.
<- function(data) {
tricky_operation # Good code goes here.
}
make(plan)
Now, that the code works, it is time to scale back up to the large data. Use make(plan, recover = TRUE)
to salvage old targets from before the debugging process.
<- drake_plan(
plan data = large_dataset(), # Use the large data again.
analysis = expensive_analysis(data) # Should be repaired now.
)
make(plan, recover = TRUE)
18.2 Why do my targets keep rerunning?
Consider the following completed workflow.
load_mtcars_example()
make(my_plan)
At this point, if you change the reg1()
function, then make()
will automatically detect and rerun downstream targets such as regression1_large
.
<- function (d) {
reg1 lm(y ~ 1 + x, data = d)
}
make(my_plan)
In general, targets are “outdated” or “invalidated” they are out of sync with their dependencies. If a target is outdated, the next make()
automatically detects discrepancies and rebuild the affected targets. Usually, this automation adds convenience, saves time, and ensures reproducibility in the face of long runtimes.
However, it can be frustrating when drake
detects outdated targets when you think everything is up to date. If this happens, it is important to understand
- How your workflow fits together.
- Which targets are outdated.
- Why your targets are outdated.
- Strategies to prevent unexpected changes in the future.
drake
’s utility functions offer clues to guide you.
18.2.1 How your workflow fits together
drake
automatically analyzes your plan and functions to understand how your targets depend on each other. It assembles this information in a directed acyclic graph (DAG) which you can visualize and explore.
vis_drake_graph(my_plan)
To get a more localized version of the graph, use deps_target()
. Unlike vis_drake_graph()
, deps_target()
gives you a more granular view of the dependencies of an individual target.
deps_target(regression1_large, my_plan)
deps_target(report, my_plan)
To understand how drake
detects dependencies in the first place, use deps_code()
. This is what drake
first sees when it reads your plan and functions to understand the dependencies.
deps_code(quote(
suppressWarnings(summary(regression1_large$residuals))
))
deps_code(quote(
knit(knitr_in("report.Rmd"), file_out("report.md"), quiet = TRUE)
))
If drake
detects new dependencies you were unaware of, that could be a reason why your targets are out of date.
18.2.2 Which targets are outdated
Graphing utilities like vis_drake_graph()
label the outdated targets, but sometimes it is helpful to get a more programmatic view.
outdated(my_plan)
18.2.3 Why your targets are outdated
The deps_profile()
function offers clues.
deps_profile(regression1_small, my_plan)
From the data frame above, regression1_small
is outdated because an R object dependency changed since the last make()
. drake
does not hold on to enough information to tell you precisely which object is the culprit, but functions like vis_drake_graph()
, deps_target()
, and deps_code()
can help narrow down the possibilities.
18.2.4 Strategies to prevent unexpected changes in the future
drake
is sensitive to changing functions in your global environment, and this sensitivity can invalidate targets unexpectedly. Whenever you plan to run make()
, it is always best to restart your R session and load your packages and functions into a fresh clean workspace. r_make()
does all this cleaning and prep work for you automatically, and it is more robust and dependable (and childproofed) than ordinary make()
. To read more, visit https://books.ropensci.org/drake/projects#safer-interactivity.
18.3 More help
The GitHub issue tracker is the best place to request help with your specific use case.