7  Packages

This chapter describes the recommended roles of R packages in targets pipelines and how to manage them in different situations.

7.1 Loading and configuring R packages

For most pipelines, it is straightforward to load the R packages that your targets need in order to run. You can either:

  1. Call library() at the top of the target script file (default: _targets.R) to load each package the conventional way, or
  2. Name the required packages using the packages argument of tar_option_set().

2. is often faster, especially for utilities like tar_visnetwork(), because it avoids loading packages unless absolutely necessary.

Some package management workflows are more complicated. If your use special configuration with conflicted, box, import, or similar utility, please do your configuration inside a project-level .Rprofile file instead of the target script file (default: _targets.R). In addition, if you use distributed workers inside external containers (Docker, Singularity, AWS AMI, etc.) make sure each container has a copy of this same .Rprofile file where the R worker process spawns. This approach is ensures that all remote workers are configured the same way as the local main process.

7.2 Package management with renv

targets and renv work extremely well together as an overall reproducibility solution for data analysis pipelines. targets makes sure your results are up to date, and renv keeps track of the packages you use.

If you use renv, then overhead from project initialization could slow down pipelines and workers. If you experience slowness, please make sure your renv library is on a fast file system. (For example, slow network drives can severely reduce performance.) In addition, you can disable the slowest initialization checks. After confirming at https://rstudio.github.io/renv/reference/config.html that you can safely disable these checks, you can write the following lines in your user-level .Renviron file:

RENV_CONFIG_SANDBOX_ENABLED=false
RENV_CONFIG_SYNCHRONIZED_CHECK=false

If you disable the synchronization check, remember to call renv::status() periodically to check the health of your renv project library.

7.3 R packages as projects

It is good practice to organize the files of a targets project similar to a research compendium or R package. However, unless have a specific reason to do so, it is usually not necessary to literally implement your targets pipeline as an installable R package with its own DESCRIPTION file. A research compendium backed by a renv library and Git-backed version control is enough reproducibility for most projects.

7.4 Target Factories

To make specific targets pipelines reusable, it is usually better to create a package with specialized target factories tailored to your use case. Packages stantargets and jagstargets are examples, and you can find more information on the broader R Targetopia at https://wlandau.github.io/targetopia/.

7.5 Package-based invalidation

Still, it is sometimes desirable to treat functions and objects from a package as dependencies when it comes to deciding which targets to rerun and which targets to skip. targets does not track package functions by default because this is not a common need. Usually, local package libraries do not need to change very often, and it is best to maintain a reproducible project library using renv.

However, if you are developing a package alongside a targets pipeline that uses it, you may wish to invalidate certain targets as you make changes to your package. For example, if you are working on a novel statistical method, it is good practice to implement the method itself as an R package and perform the computation for the research paper in one or more targets pipelines.

To track the contents of packages package1 and package2, you must

  1. Fully install these packages with install.packages() or equivalent. devtools::load_all() is insufficient because it does not make the packages available to parallel workers.
  2. Write the following in your target script file (default: _targets.R):
# _targets.R
library(targets)
tar_option_set(
  packages = c("package1", "package2", ...), # `...` is for other packages.
  imports = c("package1", "package2")
)
list(
  tar_target(name = output, command = function_from_package1())
)

packages = c("package1", "package2", ...) tells targets to call library(package1), library(package2), etc. before running each target. imports = c("package1", "package2") tells targets to dive into the environments of package1 and package2 and reproducibly track all the objects and datasets as if they were part of the global environment. For example, if you define a function function_from_package1() in package1, then you should see a function node for function_from_package1() in the graph produced by tar_visnetwork(targets_only = FALSE), and targets downstream of function_from_package1() will invalidate if you install an update to package1 with a new version of function_from_package1(). The next time you call tar_make(), those invalidated targets will automatically rerun.

One limitation is that entire namespaced calls like package1::function_from_package1() are not registered in the dependency graph because of the limitations of the static code analysis capabilities of targets (powered by codetools::findGlobals()). tar_target(name = output, command = function_from_package1()) has a dependency function_from_package1(), but tar_target(name = output, command = package1::function_from_package1()) does not have a dependency package1::function_from_package1(). This is because :: is treated as a function with arguments package and function_from_package1.