Chapter 7 Packages
This chapter describes the recommended roles of R packages in targets
pipelines and how to manage them in different situations.
7.1 Loading and configuring R packages
For most pipelines, it is straightforward to load the R packages that your targets need in order to run. You can either:
- Call
library()
at the top of the target script file (default:_targets.R
) to load each package the conventional way, or - Name the required packages using the
packages
argument oftar_option_set()
.
2. is often faster, especially for utilities like tar_visnetwork()
, because it avoids loading packages unless absolutely necessary.
Some package management workflows are more complicated. If your use special configuration with conflicted, box
, import
, or similar utility, please do your configuration inside a project-level .Rprofile
file instead of the target script file (default: _targets.R
). In addition, if you use distributed workers inside external containers (Docker, Singularity, AWS AMI, etc.) make sure each container has a copy of this same .Rprofile
file where the R worker process spawns. This approach is ensures that all remote workers are configured the same way as the local main process.
7.2 R packages as projects
It is good practice to organize the files of a targets
project similar to a research compendium or R package. However, unless have a specific reason to do so, it is usually not necessary to literally implement your targets
pipeline as an installable R package with its own DESCRIPTION
file. A research compendium backed by a renv
library and Git-backed version control is enough reproducibility for most projects.
7.3 Target Factories
To make specific targets
pipelines reusable, it is usually better to create a package with specialized target factories tailored to your use case. Packages stantargets
and jagstargets
are examples, and you can find more information on the broader R Targetopia at https://wlandau.github.io/targetopia/.
7.4 Package-based invalidation
Still, it is sometimes desirable to treat functions and objects from a package as dependencies when it comes to deciding which targets to rerun and which targets to skip. targets
does not track package functions by default because this is not a common need. Usually, local package libraries do not need to change very often, and it is best to maintain a reproducible project library using renv
.
However, if you are developing a package alongside a targets
pipeline that uses it, you may wish to invalidate certain targets as you make changes to your package. For example, if you are working on a novel statistical method, it is good practice to implement the method itself as an R package and perform the computation for the research paper in one or more targets
pipelines.
To track the contents of packages package1
and package2
, you must
- Fully install these packages with
install.packages()
or equivalent.devtools::load_all()
is insufficient because it does not make the packages available to parallel workers. - Write the following in your target script file (default:
_targets.R
):
# _targets.R
library(targets)
tar_option_set(
packages = c("package1", "package2", ...), # `...` is for other packages.
imports = c("package1", "package2")
)# Write the rest of _targets.R below.
# ...
packages = c("package1", "package2", ...)
tells targets
to call library(package1)
, library(package2)
, etc. before running each target. imports = c("package1", "package2")
tells targets
to dive into the environments of package1
and package2
and reproducibly track all the objects it finds. For example, if you define a function f()
in package1
, then you should see a function node for f()
in the graph produced by tar_visnetwork(targets_only = FALSE)
, and targets downstream of f()
will invalidate if you install an update to package1
with a new version of f()
. The next time you call tar_make()
, those invalidated targets will automatically rerun.