├── .git/
├── .Rprofile
├── .Renviron
├── renv/
├── index.Rmd
├── _targets/
├── _targets.R
├── _targets.yaml
├── R/
├──── functions_data.R
├──── functions_analysis.R
├──── functions_visualization.R
├── data/
└──── input_data.csv8 Projects
A project is a targets pipeline together with its supporting source code, data, and configuration settings. This chapter explains best practices when it comes to organizing and configuring targets projects.
8.1 Extra reproducibility
For extra reproducibility, it is good practice to use the renv R package for package management and Git/GitHub for code version control. The entire _targets/ data store should generally not be committed to Git because of its large size.1 The broader R community has excellent resources and tutorials on getting started with these third-party tools.
If you use renv, then overhead from project initialization could slow down pipelines and workers. If you experience slowness, please make sure your renv library is on a fast file system. (For example, slow network drives can severely reduce performance.) In addition, you can disable the slowest initialization checks. After confirming at https://rstudio.github.io/renv/reference/config.html that you can safely disable these checks, you can write the following lines in your user-level .Renviron file:
RENV_CONFIG_SANDBOX_ENABLED=false
RENV_CONFIG_SYNCHRONIZED_CHECK=false
If you disable the synchronization check, remember to call renv::status() periodically to check the health of your renv project library.
8.2 Project files
targets is mostly indifferent to how you organize the files in your project. However, it is good practice to follow the overall structure of a research compendium or R package (not necessarily with a DESCRIPTION file). It also is good practice to give each project its own unique folder with one targets pipeline, one renv library for package management, and one Git/GitHub repository for code version control. As described later, it is possible to create multiple overlapping projects within a single folder, but this is not recommended for most situations.
The walkthrough chapter shows the file structure for a minimal targets project. For more serious projects, the file system may expand to look something like this:
Some of these files are optional, and they have the following roles.
.git/: a folder automatically created by Git for version control purposes..Rprofile: a text file automatically created byrenvto automatically load the project library when you start R at the project root folder. You may wish to add other global configuration here, e.g. declare package precedence using theconflictedpackage..Renviron: a text file of key-value pairs defining project-level environment variables, e.g. API keys and package settings. SeeSys.getenv()for more information on environment variables and how to work with them in R.
index.Rmd: Target Markdown report source file to define the pipeline._targets/: the data store wheretar_make()and similar functions write target storage and metadata when they run the pipeline._targets.R: the target script file. Alltargetspipelines must have a target script file that returns a target list at the end. If you use Target Markdown (e.g.index.Rmdabove) then the target script will be written automatically. Otherwise, you may write it by hand. Unless you apply the custom configuration described later in this chapter, the target script file will always be called_targets.Rand live at the project root folder._targets.yaml: a YAML file to set default arguments to critical functions liketar_make(). As described below, you can access and modify this file with functionstar_config_get(),tar_config_set(), andtar_config_unset().targetswill attempt to look for_targets.yamlunless you set a different path in theTAR_CONFIGenvironment variable.R/: directory of scripts containing custom user-defined R code. Most of the code will likely contain custom functions you write to support your targets. You can load these functions withsource("R/function_script.R")oreval(parse(text = "R/function_script.R"), either in atar_globals = TRUEcode chunk in Target Markdown or directly in_targets.Rif you are not using Target Markdown.data/: directory of local input data files. As described in the files chapter, it is good practice to track input files usingformat = "file"intar_target()and then reference those file targets in downstream targets that directly depend on those files.
8.3 Multiple projects
It is generally good practice to give each project its own unique folder with one targets pipeline, one renv library for package management, and one Git/GitHub repository for code version control. However, sometimes it is reasonable to maintain multiple pipelines within a project: for example, if different pipelines have similar research goals and share the same code base of custom user-defined functions. This section explains how to maintain and navigate such a collection of overlapping projects.
The functionality below assumes you have targets version 0.7.0.9001 or higher, which you may need to install from GitHub.
remotes::install_github("ropensci/targets")8.3.1 Create each project.
To begin, write the shared code base of custom user-defined functions in R/, and write one targets pipeline per project. For convenience, we will directly write to the targets script files, but the principles generalize to Target Markdown. The file structure looks something like this:
├── _targets.yaml
├── script_a.R
├── script_b.R
├── R/
├──── functions_data.R
├──── functions_analysis.R
├──── functions_visualization.R
...All projects share the same functions defined in the scripts in R/, and each project uses a different target script and data store. script_a.R defines the targets for project A.
# script_a.R
library(targets)
library(tarchetypes)
tar_source()
tar_option_set(packages = "tidyverse")
list(
tar_target(target_abc, f(..)),
tar_target(tarbet_xyz, g(...))
)Likewise, script_b.R defines the targets for project B.
# script_b.R
library(targets)
library(tarchetypes)
tar_source()
tar_option_set(packages = "tidyverse")
list(
tar_target(target_123, f(...)),
tar_target(target_456, h(...))
)8.3.2 Configure each project.
To establish a different store and script per project, write a top-level _targets.yaml configuration to specify these paths explicitly. You can do this from R with tar_config_set().
tar_config_set(script = "script_a.R", store = "store_a", project = "project_a")
tar_config_set(script = "script_b.R", store = "store_b", project = "project_b")The R code above writes the following _targets.yaml configuration file.
project_a:
store: store_a
script: script_a.R
project_b:
store: store_b
script: script_b.R
8.3.3 Run each project
To run each project, run tar_make() with the correct target script and data store. To select the correct script and store, set the TAR_PROJECT environment variable to the correct project name. that way, tar_config_get() automatically supplies the correct script and store arguments to tar_make().
Sys.setenv(TAR_PROJECT = "project_a")
tar_make()
tar_read(target_abc)
Sys.setenv(TAR_PROJECT = "project_b")
tar_make()
tar_read(target_123)Alternatively, you can manually select the appropriate script and store for each project. This is a less convenient approach, but if you do it, you do not need to set the TAR_PROJECT environment variable or rely on _targets.yaml.
tar_make(script = "script_a.R", store = "store_a")
tar_read(target_abc, store = "store_a")
tar_make(script = "script_b.R", store = "store_b")
tar_read(target_abc, store = "store_b")8.4 Interdependent projects
8.4.1 Config inheritance
_targets.yaml can control more than just the script and store, and different projects can inherit settings from one another. In the following example, project B inherits from project A, so projects A and B both set reporter = "summary" and shorcut = TRUE by default in tar_make().
tar_config_set(
script = "script_a.R",
store = "store_a",
reporter_make = "summary",
shortcut = TRUE,
project = "project_a"
)
tar_config_set(
script = "script_b.R",
store = "store_b",
inherits = "project_a",
project = "project_b",
)writeLines(readLines("_targets.yaml"))
#> project_a:
#> reporter_make: balanced
#> script: script_a.R
#> shortcut: yes
#> store: store_a
#> project_b:
#> inherits: project_a
#> script: script_b.R
#> store: store_b
Sys.setenv(TAR_PROJECT = "project_b")
tar_config_get("script")
#> [1] "script_b.R"
tar_config_get("reporter_make")
#> [1] "balanced"
tar_config_get("shortcut")
#> [1] TRUE8.5 The config package
The _targets.yaml config interface borrows heavily from the ideas in the config R package. However, it does not actually use the config package, nor does it copy or use the config source code in any way. And there are major differences in user-side behavior:
- There is no requirement to have a configuration (i.e. project) named “default”.
- The default project is called “main”, and other projects do not inherit from it automatically.
- Not all fields need to be populated in
_targets.yamlbecause thetargetspackage already has system defaults.
However, you may wish to commit
_targets/meta/meta, which is critical to checking the status of each target and reading targets into memory.↩︎