Chapter 19 Triggers: decision rules for building targets
When you call make(), drake tries to skip as many targets as possible. If it thinks a command will return the same value as last time, it does not bother running it. In other words, drake is lazy, and laziness saves you time.
19.1 What are triggers?
To figure out whether it can skip a target, drake goes through an intricate checklist of triggers:
- The missing trigger: Do we lack a return value from a previous
make()? Maybe you are building the target for the first time or you removed it from the cache withclean(). - The command trigger: did the command in the
drakeplan change nontrivially since the lastmake()? Changes to spacing, formatting, and comments are ignored. - The depend trigger: did any non-file dependencies change since the last
make()? These could be:- Other targets.
- Imported objects.
- Imported functions. To track changes to a function,
drakeremoves any code closed inignore(), deparses the literal code so that whitespace is standardized and comments are removed, and then hashes the resulting string. In some cases,drakemakes special adjustments for strange edge cases likeRcppfunctions with pointers and functions defined withVectorize(). However, edge cases like this one are inevitable because of the flexibility of R. - Any dependencies of imported functions.
- Any dependencies of dependencies of imported functions, and so on.
- The file trigger: did any file inputs or file outputs change since the last
make()? These files are the ones explicitly declared in the command withfile_in(),knitr_in(), andfile_out(). - The seed trigger: for statistical reproducibility,
drakeassigns a unique seed to each target based on the target’s name and the globalseedargument tomake(). If you change the target’s pseudo-random number generator seed either with theseedargument or the customseedcolumn in the plan, this change will cause a rebuild if theseedtrigger is turned on. - The format trigger: did you add or change the target’s storage format since last build? Details: https://books.ropensci.org/drake/plans.html#special-data-formats-for-targets.
- The condition trigger: an optional user-defined piece of code that evaluates to a
TRUE/FALSEvalue. The target builds if the value isTRUE. - The change trigger: an optional user-defined piece of code that evaluates to any value (preferably small and quick to compute). The target builds if the value changed since the last
make().
If any trigger detects something wrong or different with the target or its dependencies, the next make() will run the command and (re)build the target.
19.2 Customization
With the trigger() function, you can create your own customized checklist of triggers. Let’s run a simple workflow with just the missing trigger. We deactivate the command, depend, and file triggers by setting the respective command, depend, and file arguments to FALSE.
plan <- drake_plan(
psi_1 = (sqrt(5) + 1) / 2,
psi_2 = (sqrt(5) - 1) / 2
)
make(plan, trigger = trigger(command = FALSE, depend = FALSE, file = FALSE))Now, even if you wreck all the commands, nothing rebuilds.
plan <- drake_plan(
psi_1 = (sqrt(5) + 1) / 2 + 9999999999999,
psi_2 = (sqrt(5) - 1) / 2 - 9999999999999
)
make(plan, trigger = trigger(command = FALSE, depend = FALSE, file = FALSE))You can also give different targets to different triggers. Triggers in the drake plan override the trigger argument to make(). Below, psi_2 always builds, but psi_1 only builds if it has never been built before.
plan <- drake_plan(
psi_1 = (sqrt(5) + 1) / 2 + 9999999999999,
psi_2 = target(
command = (sqrt(5) - 1) / 2 - 9999999999999,
trigger = trigger(condition = psi_1 > 0)
)
)
plan
make(plan, trigger = trigger(command = FALSE, depend = FALSE, file = FALSE))
make(plan, trigger = trigger(command = FALSE, depend = FALSE, file = FALSE))Interestingly, psi_2 now depends on psi_1. Since psi_1 is part of the target psi_2 because of the condition trigger, it needs to be up to date before we attempt psi_2. However, since psi_1 is not part of the command, changing it will not trip the other triggers such as depend.
vis_drake_graph(plan)In the next toy example below, drake reads from a file to decide whether to build x. Try it out.
plan <- drake_plan(
x = target(
1 + 1,
trigger = trigger(condition = file_in(readRDS("file.rds")))
)
)
saveRDS(TRUE, "file.rds")
make(plan)
make(plan)
make(plan)
saveRDS(FALSE, "file.rds")
make(plan)
make(plan)
make(plan)In a real project with remote data sources, you may want to use the condition trigger to limit your builds to times when enough bandwidth is available for a large download. For example,
drake_plan(
x = target(
command = download_large_dataset(),
trigger = trigger(condition = is_enough_bandwidth())
)
)Since the change trigger can return any value, it is often easier to use than the condition trigger.
clean(destroy = TRUE)
plan <- drake_plan(
x = target(
command = 1 + 1,
trigger = trigger(change = sqrt(y))
)
)
y <- 1
make(plan)
make(plan)
y <- 2
make(plan)In practice, you may want to use the change trigger to check a large remote before downloading it.
drake_plan(
x = target(
command = download_large_dataset(),
trigger = trigger(
condition = is_enough_bandwidth(),
change = date_last_modified()
)
)
)A word of caution: every non-NULL change trigger is always evaluated, and its value is carried around in memory throughout make(). So if you are not careful, heavy use of the change trigger could slow down your workflow and consume extra resources. The change trigger should return small values (and should ideally be quick to evaluate). To reduce memory consumption, you may want to return a fingerprint of your trigger value rather than the value itself. See the digest package for more information on computing hashes/fingerprints.
library(digest)
drake_plan(
x = target(
command = download_large_dataset(),
trigger = trigger(
change = digest(download_medium_dataset())
)
)
)19.3 Alternative trigger modes
Sometimes, you may want to suppress a target without having to worry about turning off every single trigger. That is why the trigger() function has a mode argument, which controls the role of the condition trigger in the decision to build or skip a target. The available trigger modes are "whitelist" (default), "blacklist", and "condition".
trigger(mode = "whitelist"): we rebuild the target wheneverconditionevaluates toTRUE. Otherwise, we defer to the other triggers. This is the default behavior described above in this chapter.trigger(mode = "blacklist"): we skip the target wheneverconditionevaluates toFALSE. Otherwise, we defer to the other triggers.trigger(mode = "condition"): here, theconditiontrigger is the only decider, and we ignore all the other triggers. We rebuild target wheneverconditionevaluates toTRUEand skip it wheneverconditionevaluates toFALSE.
19.4 A more practical example
See the “packages” example for a more practical demonstration of triggers and their usefulness.