Chapter 19 Triggers: decision rules for building targets
When you call make()
, drake
tries to skip as many targets as possible. If it thinks a command will return the same value as last time, it does not bother running it. In other words, drake
is lazy, and laziness saves you time.
19.1 What are triggers?
To figure out whether it can skip a target, drake
goes through an intricate checklist of triggers:
- The missing trigger: Do we lack a return value from a previous
make()
? Maybe you are building the target for the first time or you removed it from the cache withclean()
. - The command trigger: did the command in the
drake
plan change nontrivially since the lastmake()
? Changes to spacing, formatting, and comments are ignored. - The depend trigger: did any non-file dependencies change since the last
make()
? These could be:- Other targets.
- Imported objects.
- Imported functions. To track changes to a function,
drake
removes any code closed inignore()
, deparses the literal code so that whitespace is standardized and comments are removed, and then hashes the resulting string. In some cases,drake
makes special adjustments for strange edge cases likeRcpp
functions with pointers and functions defined withVectorize()
. However, edge cases like this one are inevitable because of the flexibility of R. - Any dependencies of imported functions.
- Any dependencies of dependencies of imported functions, and so on.
- The file trigger: did any file inputs or file outputs change since the last
make()
? These files are the ones explicitly declared in the command withfile_in()
,knitr_in()
, andfile_out()
. - The seed trigger: for statistical reproducibility,
drake
assigns a unique seed to each target based on the target’s name and the globalseed
argument tomake()
. If you change the target’s pseudo-random number generator seed either with theseed
argument or the customseed
column in the plan, this change will cause a rebuild if theseed
trigger is turned on. - The format trigger: did you add or change the target’s storage format since last build? Details: https://books.ropensci.org/drake/plans.html#special-data-formats-for-targets.
- The condition trigger: an optional user-defined piece of code that evaluates to a
TRUE
/FALSE
value. The target builds if the value isTRUE
. - The change trigger: an optional user-defined piece of code that evaluates to any value (preferably small and quick to compute). The target builds if the value changed since the last
make()
.
If any trigger detects something wrong or different with the target or its dependencies, the next make()
will run the command and (re)build the target.
19.2 Customization
With the trigger()
function, you can create your own customized checklist of triggers. Let’s run a simple workflow with just the missing trigger. We deactivate the command, depend, and file triggers by setting the respective command
, depend
, and file
arguments to FALSE
.
<- drake_plan(
plan psi_1 = (sqrt(5) + 1) / 2,
psi_2 = (sqrt(5) - 1) / 2
)make(plan, trigger = trigger(command = FALSE, depend = FALSE, file = FALSE))
Now, even if you wreck all the commands, nothing rebuilds.
<- drake_plan(
plan psi_1 = (sqrt(5) + 1) / 2 + 9999999999999,
psi_2 = (sqrt(5) - 1) / 2 - 9999999999999
)make(plan, trigger = trigger(command = FALSE, depend = FALSE, file = FALSE))
You can also give different targets to different triggers. Triggers in the drake
plan override the trigger
argument to make()
. Below, psi_2
always builds, but psi_1
only builds if it has never been built before.
<- drake_plan(
plan psi_1 = (sqrt(5) + 1) / 2 + 9999999999999,
psi_2 = target(
command = (sqrt(5) - 1) / 2 - 9999999999999,
trigger = trigger(condition = psi_1 > 0)
)
)
planmake(plan, trigger = trigger(command = FALSE, depend = FALSE, file = FALSE))
make(plan, trigger = trigger(command = FALSE, depend = FALSE, file = FALSE))
Interestingly, psi_2
now depends on psi_1
. Since psi_1
is part of the target psi_2
because of the condition trigger, it needs to be up to date before we attempt psi_2
. However, since psi_1
is not part of the command, changing it will not trip the other triggers such as depend.
vis_drake_graph(plan)
In the next toy example below, drake
reads from a file to decide whether to build x
. Try it out.
<- drake_plan(
plan x = target(
1 + 1,
trigger = trigger(condition = file_in(readRDS("file.rds")))
)
)saveRDS(TRUE, "file.rds")
make(plan)
make(plan)
make(plan)
saveRDS(FALSE, "file.rds")
make(plan)
make(plan)
make(plan)
In a real project with remote data sources, you may want to use the condition trigger to limit your builds to times when enough bandwidth is available for a large download. For example,
drake_plan(
x = target(
command = download_large_dataset(),
trigger = trigger(condition = is_enough_bandwidth())
) )
Since the change trigger can return any value, it is often easier to use than the condition trigger.
clean(destroy = TRUE)
<- drake_plan(
plan x = target(
command = 1 + 1,
trigger = trigger(change = sqrt(y))
)
)<- 1
y make(plan)
make(plan)
<- 2
y make(plan)
In practice, you may want to use the change trigger to check a large remote before downloading it.
drake_plan(
x = target(
command = download_large_dataset(),
trigger = trigger(
condition = is_enough_bandwidth(),
change = date_last_modified()
)
) )
A word of caution: every non-NULL
change trigger is always evaluated, and its value is carried around in memory throughout make()
. So if you are not careful, heavy use of the change trigger could slow down your workflow and consume extra resources. The change trigger should return small values (and should ideally be quick to evaluate). To reduce memory consumption, you may want to return a fingerprint of your trigger value rather than the value itself. See the digest
package for more information on computing hashes/fingerprints.
library(digest)
drake_plan(
x = target(
command = download_large_dataset(),
trigger = trigger(
change = digest(download_medium_dataset())
)
) )
19.3 Alternative trigger modes
Sometimes, you may want to suppress a target without having to worry about turning off every single trigger. That is why the trigger()
function has a mode
argument, which controls the role of the condition trigger in the decision to build or skip a target. The available trigger modes are "whitelist"
(default), "blacklist"
, and "condition"
.
trigger(mode = "whitelist")
: we rebuild the target whenevercondition
evaluates toTRUE
. Otherwise, we defer to the other triggers. This is the default behavior described above in this chapter.trigger(mode = "blacklist")
: we skip the target whenevercondition
evaluates toFALSE
. Otherwise, we defer to the other triggers.trigger(mode = "condition")
: here, thecondition
trigger is the only decider, and we ignore all the other triggers. We rebuild target whenevercondition
evaluates toTRUE
and skip it whenevercondition
evaluates toFALSE
.
19.4 A more practical example
See the “packages” example for a more practical demonstration of triggers and their usefulness.