Chapter 8 Packages
This chapter describes the recommended roles of R packages in
targets pipelines and how to manage them in different situations.
8.1 Loading and configuring R packages
For most pipelines, it is straightforward to load the R packages that your targets need in order to run. You can either:
library()at the top of the target script file (default:
_targets.R) to load each package the conventional way, or
- Name the required packages using the
2. is often faster, especially for utilities like
tar_visnetwork(), because it avoids loading packages unless absolutely necessary.
Some package management workflows are more complicated. If your use special configuration with conflicted,
import, or similar utility, please do your configuration inside a project-level
.Rprofile file instead of the target script file (default:
_targets.R). In addition, if you use distributed workers inside external containers (Docker, Singularity, AWS AMI, etc.) make sure each container has a copy of this same
.Rprofile file where the R worker process spawns. This approach is ensures that all remote workers are configured the same way as the local main process.
8.2 R packages as projects
It is good practice to organize the files of a
targets project similar to a research compendium or R package. However, unless have a specific reason to do so, it is usually not necessary to literally implement your
targets pipeline as an installable R package with its own
DESCRIPTION file. A research compendium backed by a
renv library and Git-backed version control is enough reproducibility for most projects.
8.3 Target Factories
To make specific
targets pipelines reusable, it is usually better to create a package with specialized target factories tailored to your use case. Packages
jagstargets are examples, and you can find more information on the broader R Targetopia at https://wlandau.github.io/targetopia/.
8.4 Package-based invalidation
Still, it is sometimes desirable to treat functions and objects from a package as dependencies when it comes to deciding which targets to rerun and which targets to skip.
targets does not track package functions by default because this is not a common need. Usually, local package libraries do not need to change very often, and it is best to maintain a reproducible project library using
However, if you are developing a package alongside a
targets pipeline that uses it, you may wish to invalidate certain targets as you make changes to your package. For example, if you are working on a novel statistical method, it is good practice to implement the method itself as an R package and perform the computation for the research paper in one or more
To track the contents of packages
package2, you must
- Fully install these packages with
devtools::load_all()is insufficient because it does not make the packages available to parallel workers.
- Write the following in your target script file (default:
# _targets.R library(targets) tar_option_set( packages = c("package1", "package2", ...), # `...` is for other packages. imports = c("package1", "package2") )# Write the rest of _targets.R below. # ...
packages = c("package1", "package2", ...) tells
targets to call
library(package2), etc. before running each target.
imports = c("package1", "package2") tells
targets to dive into the environments of
package2 and reproducibly track all the objects it finds. For example, if you define a function
package1, then you should see a function node for
f() in the graph produced by
tar_visnetwork(targets_only = FALSE), and targets downstream of
f() will invalidate if you install an update to
package1 with a new version of
f(). The next time you call
tar_make(), those invalidated targets will automatically rerun.