Chapter 8 Packages

This chapter describes the recommended roles of R packages in targets pipelines and how to manage them in different situations.

8.1 Loading and configuring R packages

For most pipelines, it is straightforward to load the R packages that your targets need in order to run. You can either:

  1. Call library() at the top of the target script file (default: _targets.R) to load each package the conventional way, or
  2. Name the required packages using the packages argument of tar_option_set().

2. is often faster, especially for utilities like tar_visnetwork(), because it avoids loading packages unless absolutely necessary.

Some package management workflows are more complicated. If your use special configuration with conflicted, box, import, or similar utility, please do your configuration inside a project-level .Rprofile file instead of the target script file (default: _targets.R). In addition, if you use distributed workers inside external containers (Docker, Singularity, AWS AMI, etc.) make sure each container has a copy of this same .Rprofile file where the R worker process spawns. This approach is ensures that all remote workers are configured the same way as the local main process.

8.2 R packages as projects

It is good practice to organize the files of a targets project similar to a research compendium or R package. However, unless have a specific reason to do so, it is usually not necessary to literally implement your targets pipeline as an installable R package with its own DESCRIPTION file. A research compendium backed by a renv library and Git-backed version control is enough reproducibility for most projects.

8.3 Target Factories

To make specific targets pipelines reusable, it is usually better to create a package with specialized target factories tailored to your use case. Packages stantargets and jagstargets are examples, and you can find more information on the broader R Targetopia at https://wlandau.github.io/targetopia/.

8.4 Package-based invalidation

Still, it is sometimes desirable to treat functions and objects from a package as dependencies when it comes to deciding which targets to rerun and which targets to skip. targets does not track package functions by default because this is not a common need. Usually, local package libraries do not need to change very often, and it is best to maintain a reproducible project library using renv.

However, if you are developing a package alongside a targets pipeline that uses it, you may wish to invalidate certain targets as you make changes to your package. For example, if you are working on a novel statistical method, it is good practice to implement the method itself as an R package and perform the computation for the research paper in one or more targets pipelines.

To track the contents of packages package1 and package2, you must

  1. Fully install these packages with install.packages() or equivalent. devtools::load_all() is insufficient because it does not make the packages available to parallel workers.
  2. Write the following in your target script file (default: _targets.R):
# _targets.R
library(targets)
tar_option_set(
  packages = c("package1", "package2", ...), # `...` is for other packages.
  imports = c("package1", "package2")
)
# Write the rest of _targets.R below.
# ...

packages = c("package1", "package2", ...) tells targets to call library(package1), library(package2), etc. before running each target. imports = c("package1", "package2") tells targets to dive into the environments of package1 and package2 and reproducibly track all the objects it finds. For example, if you define a function f() in package1, then you should see a function node for f() in the graph produced by tar_visnetwork(targets_only = FALSE), and targets downstream of f() will invalidate if you install an update to package1 with a new version of f(). The next time you call tar_make(), those invalidated targets will automatically rerun.

Copyright Eli Lilly and Company