A project is a
targets pipeline together with its supporting source code, data, and configuration settings. This chapter explains best practices when it comes to organizing and configuring
For extra reproducibility, it is good practice to use the
renv R package for package management and Git/GitHub for code version control. The entire
_targets/ data store should generally not be committed to Git because of its large size.13 The broader R community has excellent resources and tutorials on getting started with these third-party tools.
targets is mostly indifferent to how you organize the files in your project. However, it is good practice to follow the overall structure of a research compendium or R package (not necessarily with a
DESCRIPTION file). It also is good practice to give each project its own unique folder with one
targets pipeline, one
renv library for package management, and one Git/GitHub repository for code version control. As described later, it is possible to create multiple overlapping projects within a single folder, but this is not recommended for most situations.
The walkthrough chapter shows the file structure for a minimal
targets project. For more serious projects, the file system may expand to look something like this:
/ ├── .git ├── .Rprofile ├── .Renviron/ ├── renv ├── index.Rmd/ ├── _targets ├── _targets.R ├── _targets.yaml/ ├── R ├──── functions_data.R ├──── functions_analysis.R ├──── functions_visualization.R/ ├── data└──── input_data.csv
Some of these files are optional, and they have the following roles.
.git/: a folder automatically created by Git for version control purposes.
.Rprofile: a text file automatically created by
renvto automatically load the project library when you start R at the project root folder. You may wish to add other global configuration here, e.g. declare package precedence using the
.Renviron: a text file of key-value pairs defining project-level environment variables, e.g. API keys and package settings. See
Sys.getenv()for more information on environment variables and how to work with them in R.
index.Rmd: Target Markdown report source file to define the pipeline.
_targets/: the data store where
tar_make()and similar functions write target storage and metadata when they run the pipeline.
_targets.R: the target script file. All
targetspipelines must have a target script file that returns a target list at the end. If you use Target Markdown (e.g.
index.Rmdabove) then the target script will be written automatically. Otherwise, you may write it by hand. Unless you apply the custom configuration described later in this chapter, the target script file will always be called
_targets.Rand live at the project root folder.
_targets.yaml: a YAML file to set default arguments to critical functions like
tar_make(). As described below, you can access and modify this file with functions
targetswill attempt to look for
_targets.yamlunless you set a different path in the
R/: directory of scripts containing custom user-defined R code. Most of the code will likely contain custom functions you write to support your targets. You can load these functions with
eval(parse(text = "R/function_script.R"), either in a
tar_globals = TRUEcode chunk in Target Markdown or directly in
_targets.Rif you are not using Target Markdown.
data/: directory of local input data files. As described in the files chapter, it is good practice to track input files using
format = "file"in
tar_target()and then reference those file targets in downstream targets that directly depend on those files.
It is generally good practice to give each project its own unique folder with one
targets pipeline, one
renv library for package management, and one Git/GitHub repository for code version control. However, sometimes it is reasonable to maintain multiple pipelines within a project: for example, if different pipelines have similar research goals and share the same code base of custom user-defined functions. This section explains how to maintain and navigate such a collection of overlapping projects.
The functionality below assumes you have
targets version 0.7.0.9001 or higher, which you may need to install from GitHub.
To begin, write the shared code base of custom user-defined functions in
R/, and write one
targets pipeline per project. For convenience, we will directly write to the targets script files, but the principles generalize to Target Markdown. The file structure looks something like this:
├── _targets.yaml ├── script_a.R ├── script_b.R/ ├── R ├──── functions_data.R ├──── functions_analysis.R ├──── functions_visualization.R...
# script_a.R library(targets) source("R/functions_data.R") source("R/functions_analysis.R") source("R/functions_visualization.R") tar_option_set(packages = "tidyverse") list( tar_target(target_abc, f(..)), tar_target(tarbet_xyz, g(...)) )
script_b.R defines the targets for project B.
# script_b.R library(targets) source("R/functions_data.R") source("R/functions_analysis.R") source("R/functions_visualization.R") tar_option_set(packages = "tidyverse") list( tar_target(target_123, f(...)), tar_target(target_456, h(...)) )
To establish a different store and script per project, write a top-level
_targets.yaml configuration to specify these paths explicitly. You can do this from R with
tar_config_set(script = "script_a.R", store = "store_a", project = "project_a") tar_config_set(script = "script_b.R", store = "store_b", project = "project_b")
The R code above writes the following
_targets.yaml configuration file.
project_a: store: store_a script: script_a.R project_b: store: store_b script: script_b.R
To run each project, run
tar_make() with the correct target script and data store. To select the correct script and store, set the
TAR_PROJECT environment variable to the correct project name. that way,
tar_config_get() automatically supplies the correct
store arguments to
Sys.setenv(TAR_PROJECT = "project_a") tar_make() tar_read(target_abc) Sys.setenv(TAR_PROJECT = "project_b") tar_make() tar_read(target_123)
Alternatively, you can manually select the appropriate script and store for each project. This is a less convenient approach, but if you do it, you do not need to set the
TAR_PROJECT environment variable or rely on
tar_make(script = "script_a.R", store = "store_a") tar_read(target_abc, sctore = "store_a") tar_make(script = "script_b.R", store = "store_b") tar_read(target_abc, sctore = "store_b")
_targets.yaml can control more than just the script and store, and different projects can inherit settings from one another. In the following example, project B inherits from project A, so projects A and B both set
reporter = "summary" and
shorcut = TRUE by default in
tar_config_set( script = "script_a.R", store = "store_a", reporter_make = "summary", shortcut = TRUE, project = "project_a" )tar_config_set( script = "script_b.R", store = "store_b", inherits = "project_a", project = "project_b", )
writeLines(readLines("_targets.yaml")) #> project_a: #> reporter_make: summary #> script: script_a.R #> shortcut: yes #> store: store_a #> project_b: #> inherits: project_a #> script: script_b.R #> store: store_b Sys.setenv(TAR_PROJECT = "project_b") tar_config_get("script") #>  "script_b.R" tar_config_get("reporter_make") #>  "summary" tar_config_get("shortcut") #>  TRUE
_targets.yaml config interface borrows heavily from the ideas in the
config R package. However, it does not actually use the
config package, nor does it copy or use the
config source code in any way. And there are major differences in user-side behavior:
- There is no requirement to have a configuration (i.e. project) named “default”.
- The default project is called “main”, and other projects do not inherit from it automatically.
- Not all fields need to be populated in
targetspackage already has system defaults.
However, you may wish to commit
_targets/meta/meta, which is critical to checking the status of each target and reading targets into memory.↩︎