8  Projects

A project is a targets pipeline together with its supporting source code, data, and configuration settings. This chapter explains best practices when it comes to organizing and configuring targets projects.

8.1 Extra reproducibility

For extra reproducibility, it is good practice to use the renv R package for package management and Git/GitHub for code version control. The entire _targets/ data store should generally not be committed to Git because of its large size.1 The broader R community has excellent resources and tutorials on getting started with these third-party tools.

If you use renv, then overhead from project initialization could slow down pipelines and workers. If you experience slowness, please make sure your renv library is on a fast file system. (For example, slow network drives can severely reduce performance.) In addition, you can disable the slowest initialization checks. After confirming at https://rstudio.github.io/renv/reference/config.html that you can safely disable these checks, you can write the following lines in your user-level .Renviron file:

RENV_CONFIG_SANDBOX_ENABLED=false
RENV_CONFIG_SYNCHRONIZED_CHECK=false

If you disable the synchronization check, remember to call renv::status() periodically to check the health of your renv project library.

8.2 Project files

targets is mostly indifferent to how you organize the files in your project. However, it is good practice to follow the overall structure of a research compendium or R package (not necessarily with a DESCRIPTION file). It also is good practice to give each project its own unique folder with one targets pipeline, one renv library for package management, and one Git/GitHub repository for code version control. As described later, it is possible to create multiple overlapping projects within a single folder, but this is not recommended for most situations.

The walkthrough chapter shows the file structure for a minimal targets project. For more serious projects, the file system may expand to look something like this:

├── .git/
├── .Rprofile
├── .Renviron
├── renv/
├── index.Rmd
├── _targets/
├── _targets.R
├── _targets.yaml
├── R/
├──── functions_data.R
├──── functions_analysis.R
├──── functions_visualization.R
├── data/
└──── input_data.csv

Some of these files are optional, and they have the following roles.

  • .git/: a folder automatically created by Git for version control purposes.
  • .Rprofile: a text file automatically created by renv to automatically load the project library when you start R at the project root folder. You may wish to add other global configuration here, e.g. declare package precedence using the conflicted package.
  • .Renviron: a text file of key-value pairs defining project-level environment variables, e.g. API keys and package settings. See Sys.getenv() for more information on environment variables and how to work with them in R.
  • index.Rmd: Target Markdown report source file to define the pipeline.
  • _targets/: the data store where tar_make() and similar functions write target storage and metadata when they run the pipeline.
  • _targets.R: the target script file. All targets pipelines must have a target script file that returns a target list at the end. If you use Target Markdown (e.g. index.Rmd above) then the target script will be written automatically. Otherwise, you may write it by hand. Unless you apply the custom configuration described later in this chapter, the target script file will always be called _targets.R and live at the project root folder.
  • _targets.yaml: a YAML file to set default arguments to critical functions like tar_make(). As described below, you can access and modify this file with functions tar_config_get(), tar_config_set(), and tar_config_unset(). targets will attempt to look for _targets.yaml unless you set a different path in the TAR_CONFIG environment variable.
  • R/: directory of scripts containing custom user-defined R code. Most of the code will likely contain custom functions you write to support your targets. You can load these functions with source("R/function_script.R") or eval(parse(text = "R/function_script.R"), either in a tar_globals = TRUE code chunk in Target Markdown or directly in _targets.R if you are not using Target Markdown.
  • data/: directory of local input data files. As described in the files chapter, it is good practice to track input files using format = "file" in tar_target() and then reference those file targets in downstream targets that directly depend on those files.

8.3 Multiple projects

It is generally good practice to give each project its own unique folder with one targets pipeline, one renv library for package management, and one Git/GitHub repository for code version control. However, sometimes it is reasonable to maintain multiple pipelines within a project: for example, if different pipelines have similar research goals and share the same code base of custom user-defined functions. This section explains how to maintain and navigate such a collection of overlapping projects.

The functionality below assumes you have targets version 0.7.0.9001 or higher, which you may need to install from GitHub.

remotes::install_github("ropensci/targets")

8.3.1 Create each project.

To begin, write the shared code base of custom user-defined functions in R/, and write one targets pipeline per project. For convenience, we will directly write to the targets script files, but the principles generalize to Target Markdown. The file structure looks something like this:

├── _targets.yaml
├── script_a.R
├── script_b.R
├── R/
├──── functions_data.R
├──── functions_analysis.R
├──── functions_visualization.R
...

All projects share the same functions defined in the scripts in R/, and each project uses a different target script and data store. script_a.R defines the targets for project A.

# script_a.R
library(targets)
library(tarchetypes)
tar_source()
tar_option_set(packages = "tidyverse")
list(
  tar_target(target_abc, f(..)),
  tar_target(tarbet_xyz, g(...))
)

Likewise, script_b.R defines the targets for project B.

# script_b.R
library(targets)
library(tarchetypes)
tar_source()
tar_option_set(packages = "tidyverse")
list(
  tar_target(target_123, f(...)),
  tar_target(target_456, h(...))
)

8.3.2 Configure each project.

To establish a different store and script per project, write a top-level _targets.yaml configuration to specify these paths explicitly. You can do this from R with tar_config_set().

tar_config_set(script = "script_a.R", store = "store_a", project = "project_a")
tar_config_set(script = "script_b.R", store = "store_b", project = "project_b")

The R code above writes the following _targets.yaml configuration file.

project_a:
  store: store_a
  script: script_a.R
project_b:
  store: store_b
  script: script_b.R

8.3.3 Run each project

To run each project, run tar_make() with the correct target script and data store. To select the correct script and store, set the TAR_PROJECT environment variable to the correct project name. that way, tar_config_get() automatically supplies the correct script and store arguments to tar_make().

Sys.setenv(TAR_PROJECT = "project_a")
tar_make()
tar_read(target_abc)
Sys.setenv(TAR_PROJECT = "project_b")
tar_make()
tar_read(target_123)

Alternatively, you can manually select the appropriate script and store for each project. This is a less convenient approach, but if you do it, you do not need to set the TAR_PROJECT environment variable or rely on _targets.yaml.

tar_make(script = "script_a.R", store = "store_a")
tar_read(target_abc, store = "store_a")
tar_make(script = "script_b.R", store = "store_b")
tar_read(target_abc, store = "store_b")

8.4 Interdependent projects

8.4.1 Config inheritance

_targets.yaml can control more than just the script and store, and different projects can inherit settings from one another. In the following example, project B inherits from project A, so projects A and B both set reporter = "summary" and shorcut = TRUE by default in tar_make().

tar_config_set(
  script = "script_a.R",
  store = "store_a",
  reporter_make = "summary",
  shortcut = TRUE,
  project = "project_a"
)
tar_config_set(
  script = "script_b.R",
  store = "store_b",
  inherits = "project_a",
  project = "project_b",
)
writeLines(readLines("_targets.yaml"))
#> project_a:
#>   reporter_make: summary
#>   script: script_a.R
#>   shortcut: yes
#>   store: store_a
#> project_b:
#>   inherits: project_a
#>   script: script_b.R
#>   store: store_b
Sys.setenv(TAR_PROJECT = "project_b")
tar_config_get("script")
#> [1] "script_b.R"
tar_config_get("reporter_make")
#> [1] "summary"
tar_config_get("shortcut")
#> [1] TRUE

8.4.2 Sharing targets

For some workflows, the output of one project serves as the input to another project. The easiest way to set this up is through global objects. The first project remains unchanged, and the second project reads from the first before the pipeline begins.

# script_b.R
library(targets)
library(tarchetypes)
tar_source()
tar_option_set(packages = "tidyverse")

object_from_project_a <- tar_read(target_from_project_a, store = "store_a")

list(
  tar_target(new_target, some_function(object_from_project_a)),
  ...
)

This approach is the most convenient and versatile, but it can be inefficient if target_from_project_a is large. A higher-performant solution for large data is to treat the file in project A’s data store as an input file target in project B. This second approach requires an understanding of the data store and an awareness of which targets are stored locally and which are stored on the cloud. For a target with repository = "local", you can begin from the file store_a/objects/target_from_project_a. Otherwise, the target’s file exists on the cloud (AWS or GCP) and you may need to access the target as a URL in a target with format = "url".

# script_b.R
library(targets)
library(tarchetypes)
tar_source()
tar_option_set(packages = "tidyverse")
list(
  tar_target(file_from_project_a, "store_a/objects/target_name", format = "file"),
  tar_target(data_from_project_a, readRDS(file_from_project_a)), # Assumes format = "rds" in project A.
  tar_target(new_target, analyze_data(data_from_project_a)),
  ...
)

8.5 The config package

The _targets.yaml config interface borrows heavily from the ideas in the config R package. However, it does not actually use the config package, nor does it copy or use the config source code in any way. And there are major differences in user-side behavior:

  1. There is no requirement to have a configuration (i.e. project) named “default”.
  2. The default project is called “main”, and other projects do not inherit from it automatically.
  3. Not all fields need to be populated in _targets.yaml because the targets package already has system defaults.

  1. However, you may wish to commit _targets/meta/meta, which is critical to checking the status of each target and reading targets into memory.↩︎