Chapter 19 Triggers: decision rules for building targets

When you call make(), drake tries to skip as many targets as possible. If it thinks a command will return the same value as last time, it does not bother running it. In other words, drake is lazy, and laziness saves you time.

19.1 What are triggers?

To figure out whether it can skip a target, drake goes through an intricate checklist of triggers:

  1. The missing trigger: Do we lack a return value from a previous make()? Maybe you are building the target for the first time or you removed it from the cache with clean().
  2. The command trigger: did the command in the drake plan change nontrivially since the last make()? Changes to spacing, formatting, and comments are ignored.
  3. The depend trigger: did any non-file dependencies change since the last make()? These could be:
    • Other targets.
    • Imported objects.
    • Imported functions. To track changes to a function, drake removes any code closed in ignore(), deparses the literal code so that whitespace is standardized and comments are removed, and then hashes the resulting string. In some cases, drake makes special adjustments for strange edge cases like Rcpp functions with pointers and functions defined with Vectorize(). However, edge cases like this one are inevitable because of the flexibility of R.
    • Any dependencies of imported functions.
    • Any dependencies of dependencies of imported functions, and so on.
  4. The file trigger: did any file inputs or file outputs change since the last make()? These files are the ones explicitly declared in the command with file_in(), knitr_in(), and file_out().
  5. The seed trigger: for statistical reproducibility, drake assigns a unique seed to each target based on the target’s name and the global seed argument to make(). If you change the target’s pseudo-random number generator seed either with the seed argument or the custom seed column in the plan, this change will cause a rebuild if the seed trigger is turned on.
  6. The format trigger: did you add or change the target’s storage format since last build? Details: https://books.ropensci.org/drake/plans.html#special-data-formats-for-targets.
  7. The condition trigger: an optional user-defined piece of code that evaluates to a TRUE/FALSE value. The target builds if the value is TRUE.
  8. The change trigger: an optional user-defined piece of code that evaluates to any value (preferably small and quick to compute). The target builds if the value changed since the last make().

If any trigger detects something wrong or different with the target or its dependencies, the next make() will run the command and (re)build the target.

19.2 Customization

With the trigger() function, you can create your own customized checklist of triggers. Let’s run a simple workflow with just the missing trigger. We deactivate the command, depend, and file triggers by setting the respective command, depend, and file arguments to FALSE.

plan <- drake_plan(
  psi_1 = (sqrt(5) + 1) / 2,
  psi_2 = (sqrt(5) - 1) / 2
)
make(plan, trigger = trigger(command = FALSE, depend = FALSE, file = FALSE))

Now, even if you wreck all the commands, nothing rebuilds.

plan <- drake_plan(
  psi_1 = (sqrt(5) + 1) / 2 + 9999999999999,
  psi_2 = (sqrt(5) - 1) / 2 - 9999999999999
)
make(plan, trigger = trigger(command = FALSE, depend = FALSE, file = FALSE))

You can also give different targets to different triggers. Triggers in the drake plan override the trigger argument to make(). Below, psi_2 always builds, but psi_1 only builds if it has never been built before.

plan <- drake_plan(
  psi_1 = (sqrt(5) + 1) / 2 + 9999999999999,
  psi_2 = target(
    command = (sqrt(5) - 1) / 2 - 9999999999999,
    trigger = trigger(condition = psi_1 > 0)
  )
)
plan
make(plan, trigger = trigger(command = FALSE, depend = FALSE, file = FALSE))
make(plan, trigger = trigger(command = FALSE, depend = FALSE, file = FALSE))

Interestingly, psi_2 now depends on psi_1. Since psi_1 is part of the target psi_2 because of the condition trigger, it needs to be up to date before we attempt psi_2. However, since psi_1 is not part of the command, changing it will not trip the other triggers such as depend.

vis_drake_graph(plan)

In the next toy example below, drake reads from a file to decide whether to build x. Try it out.

plan <- drake_plan(
  x = target(
    1 + 1,
    trigger = trigger(condition = file_in(readRDS("file.rds")))
  )
)
saveRDS(TRUE, "file.rds")
make(plan)
make(plan)
make(plan)
saveRDS(FALSE, "file.rds")
make(plan)
make(plan)
make(plan)

In a real project with remote data sources, you may want to use the condition trigger to limit your builds to times when enough bandwidth is available for a large download. For example,

drake_plan(
  x = target(
    command = download_large_dataset(),
    trigger = trigger(condition = is_enough_bandwidth())
  )
)

Since the change trigger can return any value, it is often easier to use than the condition trigger.

clean(destroy = TRUE)
plan <- drake_plan(
  x = target(
    command = 1 + 1,
    trigger = trigger(change = sqrt(y))
  )
)
y <- 1
make(plan)
make(plan)
y <- 2
make(plan)

In practice, you may want to use the change trigger to check a large remote before downloading it.

drake_plan(
  x = target(
    command = download_large_dataset(),
    trigger = trigger(
      condition = is_enough_bandwidth(),
      change = date_last_modified()
    )
  )
)

A word of caution: every non-NULL change trigger is always evaluated, and its value is carried around in memory throughout make(). So if you are not careful, heavy use of the change trigger could slow down your workflow and consume extra resources. The change trigger should return small values (and should ideally be quick to evaluate). To reduce memory consumption, you may want to return a fingerprint of your trigger value rather than the value itself. See the digest package for more information on computing hashes/fingerprints.

library(digest)
drake_plan(
  x = target(
    command = download_large_dataset(),
    trigger = trigger(
      change = digest(download_medium_dataset())
    )
  )
)

19.3 Alternative trigger modes

Sometimes, you may want to suppress a target without having to worry about turning off every single trigger. That is why the trigger() function has a mode argument, which controls the role of the condition trigger in the decision to build or skip a target. The available trigger modes are "whitelist" (default), "blacklist", and "condition".

  • trigger(mode = "whitelist"): we rebuild the target whenever condition evaluates to TRUE. Otherwise, we defer to the other triggers. This is the default behavior described above in this chapter.
  • trigger(mode = "blacklist"): we skip the target whenever condition evaluates to FALSE. Otherwise, we defer to the other triggers.
  • trigger(mode = "condition"): here, the condition trigger is the only decider, and we ignore all the other triggers. We rebuild target whenever condition evaluates to TRUE and skip it whenever condition evaluates to FALSE.

19.4 A more practical example

See the “packages” example for a more practical demonstration of triggers and their usefulness.

Copyright Eli Lilly and Company