# _targets.R
library(targets)
library(tarchetypes)
tar_option_set(
packages = c("package1", "package2", ...), # `...` is for other packages.
imports = c("package1", "package2")
)list(
tar_target(name = output, command = function_from_package1())
)
7 Packages
This chapter describes the recommended roles of R packages in targets
pipelines and how to manage them in different situations.
7.1 Loading and configuring R packages
For most pipelines, it is straightforward to load the R packages that your targets need in order to run. You can either:
- Call
library()
at the top of the target script file (default:_targets.R
) to load each package the conventional way, or - Name the required packages using the
packages
argument oftar_option_set()
.
2. is often faster, especially for utilities like tar_visnetwork()
, because it avoids loading packages unless absolutely necessary.
Some package management workflows are more complicated. If your use special configuration with conflicted, box
, import
, or similar utility, please do your configuration inside a project-level .Rprofile
file instead of the target script file (default: _targets.R
). In addition, if you use distributed workers inside external containers (Docker, Singularity, AWS AMI, etc.) make sure each container has a copy of this same .Rprofile
file where the R worker process spawns. This approach is ensures that all remote workers are configured the same way as the local main process.
7.2 Package management with renv
targets
and renv
work extremely well together as an overall reproducibility solution for data analysis pipelines. targets
makes sure your results are up to date, and renv
keeps track of the packages you use.
If you use renv
, then overhead from project initialization could slow down pipelines and workers. If you experience slowness, please make sure your renv
library is on a fast file system. (For example, slow network drives can severely reduce performance.) In addition, you can disable the slowest initialization checks. After confirming at https://rstudio.github.io/renv/reference/config.html that you can safely disable these checks, you can write the following lines in your user-level .Renviron
file:
RENV_CONFIG_SANDBOX_ENABLED=false
RENV_CONFIG_SYNCHRONIZED_CHECK=false
If you disable the synchronization check, remember to call renv::status()
periodically to check the health of your renv
project library.
7.3 R packages as projects
It is good practice to organize the files of a targets
project similar to a research compendium or R package. However, unless have a specific reason to do so, it is usually not necessary to literally implement your targets
pipeline as an installable R package with its own DESCRIPTION
file. A research compendium backed by a renv
library and Git-backed version control is enough reproducibility for most projects.
7.4 Target Factories
To make specific targets
pipelines reusable, it is usually better to create a package with specialized target factories tailored to your use case. Packages stantargets
and jagstargets
are examples, and you can find more information on the broader R Targetopia at https://wlandau.github.io/targetopia/.
7.5 Package-based invalidation
Still, it is sometimes desirable to treat functions and objects from a package as dependencies when it comes to deciding which targets to rerun and which targets to skip. targets
does not track package functions by default because this is not a common need. Usually, local package libraries do not need to change very often, and it is best to maintain a reproducible project library using renv
.
However, if you are developing a package alongside a targets
pipeline that uses it, you may wish to invalidate certain targets as you make changes to your package. For example, if you are working on a novel statistical method, it is good practice to implement the method itself as an R package and perform the computation for the research paper in one or more targets
pipelines.
To track the contents of packages package1
and package2
, you must
- Fully install these packages with
install.packages()
or equivalent.devtools::load_all()
is insufficient because it does not make the packages available to parallel workers. - Write the following in your target script file (default:
_targets.R
):
packages = c("package1", "package2", ...)
tells targets
to call library(package1)
, library(package2)
, etc. before running each target. imports = c("package1", "package2")
tells targets
to dive into the environments of package1
and package2
and reproducibly track all the objects and datasets as if they were part of the global environment. For example, if you define a function function_from_package1()
in package1
, then you should see a function node for function_from_package1()
in the graph produced by tar_visnetwork(targets_only = FALSE)
, and targets downstream of function_from_package1()
will invalidate if you install an update to package1
with a new version of function_from_package1()
. The next time you call tar_make()
, those invalidated targets will automatically rerun.
One limitation is that entire namespaced calls like package1::function_from_package1()
are not registered in the dependency graph because of the limitations of the static code analysis capabilities of targets
(powered by codetools::findGlobals()
). tar_target(name = output, command = function_from_package1())
has a dependency function_from_package1()
, but tar_target(name = output, command = package1::function_from_package1())
does not have a dependency package1::function_from_package1()
. This is because ::
is treated as a function with arguments package
and function_from_package1
.