Chapter 19 Triggers: decision rules for building targets
When you call
drake tries to skip as many targets as possible. If it thinks a command will return the same value as last time, it does not bother running it. In other words,
drake is lazy, and laziness saves you time.
19.1 What are triggers?
To figure out whether it can skip a target,
drake goes through an intricate checklist of triggers:
- The missing trigger: Do we lack a return value from a previous
make()? Maybe you are building the target for the first time or you removed it from the cache with
- The command trigger: did the command in the
drakeplan change nontrivially since the last
make()? Changes to spacing, formatting, and comments are ignored.
- The depend trigger: did any non-file dependencies change since the last
make()? These could be:
- Other targets.
- Imported objects.
- Imported functions. To track changes to a function,
drakeremoves any code closed in
ignore(), deparses the literal code so that whitespace is standardized and comments are removed, and then hashes the resulting string. In some cases,
drakemakes special adjustments for strange edge cases like
Rcppfunctions with pointers and functions defined with
Vectorize(). However, edge cases like this one are inevitable because of the flexibility of R.
- Any dependencies of imported functions.
- Any dependencies of dependencies of imported functions, and so on.
- The file trigger: did any file inputs or file outputs change since the last
make()? These files are the ones explicitly declared in the command with
- The seed trigger: for statistical reproducibility,
drakeassigns a unique seed to each target based on the target’s name and the global
make(). If you change the target’s pseudo-random number generator seed either with the
seedargument or the custom
seedcolumn in the plan, this change will cause a rebuild if the
seedtrigger is turned on.
- The format trigger: did you add or change the target’s storage format since last build? Details: https://books.ropensci.org/drake/plans.html#special-data-formats-for-targets.
- The condition trigger: an optional user-defined piece of code that evaluates to a
FALSEvalue. The target builds if the value is
- The change trigger: an optional user-defined piece of code that evaluates to any value (preferably small and quick to compute). The target builds if the value changed since the last
If any trigger detects something wrong or different with the target or its dependencies, the next
make() will run the command and (re)build the target.
trigger() function, you can create your own customized checklist of triggers. Let’s run a simple workflow with just the missing trigger. We deactivate the command, depend, and file triggers by setting the respective
file arguments to
drake_plan( plan <-psi_1 = (sqrt(5) + 1) / 2, psi_2 = (sqrt(5) - 1) / 2 )make(plan, trigger = trigger(command = FALSE, depend = FALSE, file = FALSE))
Now, even if you wreck all the commands, nothing rebuilds.
drake_plan( plan <-psi_1 = (sqrt(5) + 1) / 2 + 9999999999999, psi_2 = (sqrt(5) - 1) / 2 - 9999999999999 )make(plan, trigger = trigger(command = FALSE, depend = FALSE, file = FALSE))
You can also give different targets to different triggers. Triggers in the
drake plan override the
trigger argument to
psi_2 always builds, but
psi_1 only builds if it has never been built before.
drake_plan( plan <-psi_1 = (sqrt(5) + 1) / 2 + 9999999999999, psi_2 = target( command = (sqrt(5) - 1) / 2 - 9999999999999, trigger = trigger(condition = psi_1 > 0) ) ) planmake(plan, trigger = trigger(command = FALSE, depend = FALSE, file = FALSE)) make(plan, trigger = trigger(command = FALSE, depend = FALSE, file = FALSE))
psi_2 now depends on
psi_1 is part of the target
psi_2 because of the condition trigger, it needs to be up to date before we attempt
psi_2. However, since
psi_1 is not part of the command, changing it will not trip the other triggers such as depend.
In the next toy example below,
drake reads from a file to decide whether to build
x. Try it out.
drake_plan( plan <-x = target( 1 + 1, trigger = trigger(condition = file_in(readRDS("file.rds"))) ) )saveRDS(TRUE, "file.rds") make(plan) make(plan) make(plan) saveRDS(FALSE, "file.rds") make(plan) make(plan) make(plan)
In a real project with remote data sources, you may want to use the condition trigger to limit your builds to times when enough bandwidth is available for a large download. For example,
drake_plan( x = target( command = download_large_dataset(), trigger = trigger(condition = is_enough_bandwidth()) ))
Since the change trigger can return any value, it is often easier to use than the condition trigger.
clean(destroy = TRUE) drake_plan( plan <-x = target( command = 1 + 1, trigger = trigger(change = sqrt(y)) ) ) 1 y <-make(plan) make(plan) 2 y <-make(plan)
In practice, you may want to use the change trigger to check a large remote before downloading it.
drake_plan( x = target( command = download_large_dataset(), trigger = trigger( condition = is_enough_bandwidth(), change = date_last_modified() ) ))
A word of caution: every non-
NULL change trigger is always evaluated, and its value is carried around in memory throughout
make(). So if you are not careful, heavy use of the change trigger could slow down your workflow and consume extra resources. The change trigger should return small values (and should ideally be quick to evaluate). To reduce memory consumption, you may want to return a fingerprint of your trigger value rather than the value itself. See the
digest package for more information on computing hashes/fingerprints.
library(digest) drake_plan( x = target( command = download_large_dataset(), trigger = trigger( change = digest(download_medium_dataset()) ) ))
19.3 Alternative trigger modes
Sometimes, you may want to suppress a target without having to worry about turning off every single trigger. That is why the
trigger() function has a
mode argument, which controls the role of the condition trigger in the decision to build or skip a target. The available trigger modes are
trigger(mode = "whitelist"): we rebuild the target whenever
TRUE. Otherwise, we defer to the other triggers. This is the default behavior described above in this chapter.
trigger(mode = "blacklist"): we skip the target whenever
FALSE. Otherwise, we defer to the other triggers.
trigger(mode = "condition"): here, the
conditiontrigger is the only decider, and we ignore all the other triggers. We rebuild target whenever
TRUEand skip it whenever
19.4 A more practical example
See the “packages” example for a more practical demonstration of triggers and their usefulness.