/
├── .git
├── .Rprofile
├── .Renviron/
├── renv
├── index.Rmd/
├── _targets
├── _targets.R
├── _targets.yaml/
├── R
├──── functions_data.R
├──── functions_analysis.R
├──── functions_visualization.R/
├── data └──── input_data.csv
8 Projects
A project is a targets
pipeline together with its supporting source code, data, and configuration settings. This chapter explains best practices when it comes to organizing and configuring targets
projects.
8.1 Extra reproducibility
For extra reproducibility, it is good practice to use the renv
R package for package management and Git/GitHub for code version control. The entire _targets/
data store should generally not be committed to Git because of its large size.1 The broader R community has excellent resources and tutorials on getting started with these third-party tools.
If you use renv
, then overhead from project initialization could slow down pipelines and workers. If you experience slowness, please make sure your renv
library is on a fast file system. (For example, slow network drives can severely reduce performance.) In addition, you can disable the slowest initialization checks. After confirming at https://rstudio.github.io/renv/reference/config.html that you can safely disable these checks, you can write the following lines in your user-level .Renviron
file:
RENV_CONFIG_SANDBOX_ENABLED=false
RENV_CONFIG_SYNCHRONIZED_CHECK=false
If you disable the synchronization check, remember to call renv::status()
periodically to check the health of your renv
project library.
8.2 Project files
targets
is mostly indifferent to how you organize the files in your project. However, it is good practice to follow the overall structure of a research compendium or R package (not necessarily with a DESCRIPTION
file). It also is good practice to give each project its own unique folder with one targets
pipeline, one renv
library for package management, and one Git/GitHub repository for code version control. As described later, it is possible to create multiple overlapping projects within a single folder, but this is not recommended for most situations.
The walkthrough chapter shows the file structure for a minimal targets
project. For more serious projects, the file system may expand to look something like this:
Some of these files are optional, and they have the following roles.
.git/
: a folder automatically created by Git for version control purposes..Rprofile
: a text file automatically created byrenv
to automatically load the project library when you start R at the project root folder. You may wish to add other global configuration here, e.g. declare package precedence using theconflicted
package..Renviron
: a text file of key-value pairs defining project-level environment variables, e.g. API keys and package settings. SeeSys.getenv()
for more information on environment variables and how to work with them in R.
index.Rmd
: Target Markdown report source file to define the pipeline._targets/
: the data store wheretar_make()
and similar functions write target storage and metadata when they run the pipeline._targets.R
: the target script file. Alltargets
pipelines must have a target script file that returns a target list at the end. If you use Target Markdown (e.g.index.Rmd
above) then the target script will be written automatically. Otherwise, you may write it by hand. Unless you apply the custom configuration described later in this chapter, the target script file will always be called_targets.R
and live at the project root folder._targets.yaml
: a YAML file to set default arguments to critical functions liketar_make()
. As described below, you can access and modify this file with functionstar_config_get()
,tar_config_set()
, andtar_config_unset()
.targets
will attempt to look for_targets.yaml
unless you set a different path in theTAR_CONFIG
environment variable.R/
: directory of scripts containing custom user-defined R code. Most of the code will likely contain custom functions you write to support your targets. You can load these functions withsource("R/function_script.R")
oreval(parse(text = "R/function_script.R")
, either in atar_globals = TRUE
code chunk in Target Markdown or directly in_targets.R
if you are not using Target Markdown.data/
: directory of local input data files. As described in the files chapter, it is good practice to track input files usingformat = "file"
intar_target()
and then reference those file targets in downstream targets that directly depend on those files.
8.3 Multiple projects
It is generally good practice to give each project its own unique folder with one targets
pipeline, one renv
library for package management, and one Git/GitHub repository for code version control. However, sometimes it is reasonable to maintain multiple pipelines within a project: for example, if different pipelines have similar research goals and share the same code base of custom user-defined functions. This section explains how to maintain and navigate such a collection of overlapping projects.
The functionality below assumes you have targets
version 0.7.0.9001 or higher, which you may need to install from GitHub.
::install_github("ropensci/targets") remotes
8.3.1 Create each project.
To begin, write the shared code base of custom user-defined functions in R/
, and write one targets
pipeline per project. For convenience, we will directly write to the targets script files, but the principles generalize to Target Markdown. The file structure looks something like this:
├── _targets.yaml
├── script_a.R
├── script_b.R/
├── R
├──── functions_data.R
├──── functions_analysis.R
├──── functions_visualization.R ...
All projects share the same functions defined in the scripts in R/
, and each project uses a different target script and data store. script_a.R
defines the targets for project A.
# script_a.R
library(targets)
library(tarchetypes)
tar_source()
tar_option_set(packages = "tidyverse")
list(
tar_target(target_abc, f(..)),
tar_target(tarbet_xyz, g(...))
)
Likewise, script_b.R
defines the targets for project B.
# script_b.R
library(targets)
library(tarchetypes)
tar_source()
tar_option_set(packages = "tidyverse")
list(
tar_target(target_123, f(...)),
tar_target(target_456, h(...))
)
8.3.2 Configure each project.
To establish a different store and script per project, write a top-level _targets.yaml
configuration to specify these paths explicitly. You can do this from R with tar_config_set()
.
tar_config_set(script = "script_a.R", store = "store_a", project = "project_a")
tar_config_set(script = "script_b.R", store = "store_b", project = "project_b")
The R code above writes the following _targets.yaml
configuration file.
project_a:
store: store_a
script: script_a.R
project_b:
store: store_b
script: script_b.R
8.3.3 Run each project
To run each project, run tar_make()
with the correct target script and data store. To select the correct script and store, set the TAR_PROJECT
environment variable to the correct project name. that way, tar_config_get()
automatically supplies the correct script
and store
arguments to tar_make()
.
Sys.setenv(TAR_PROJECT = "project_a")
tar_make()
tar_read(target_abc)
Sys.setenv(TAR_PROJECT = "project_b")
tar_make()
tar_read(target_123)
Alternatively, you can manually select the appropriate script and store for each project. This is a less convenient approach, but if you do it, you do not need to set the TAR_PROJECT
environment variable or rely on _targets.yaml
.
tar_make(script = "script_a.R", store = "store_a")
tar_read(target_abc, store = "store_a")
tar_make(script = "script_b.R", store = "store_b")
tar_read(target_abc, store = "store_b")
8.4 Interdependent projects
8.4.1 Config inheritance
_targets.yaml
can control more than just the script and store, and different projects can inherit settings from one another. In the following example, project B inherits from project A, so projects A and B both set reporter = "summary"
and shorcut = TRUE
by default in tar_make()
.
tar_config_set(
script = "script_a.R",
store = "store_a",
reporter_make = "summary",
shortcut = TRUE,
project = "project_a"
)tar_config_set(
script = "script_b.R",
store = "store_b",
inherits = "project_a",
project = "project_b",
)
writeLines(readLines("_targets.yaml"))
#> project_a:
#> reporter_make: summary
#> script: script_a.R
#> shortcut: yes
#> store: store_a
#> project_b:
#> inherits: project_a
#> script: script_b.R
#> store: store_b
Sys.setenv(TAR_PROJECT = "project_b")
tar_config_get("script")
#> [1] "script_b.R"
tar_config_get("reporter_make")
#> [1] "summary"
tar_config_get("shortcut")
#> [1] TRUE
8.5 The config package
The _targets.yaml
config interface borrows heavily from the ideas in the config
R package. However, it does not actually use the config
package, nor does it copy or use the config
source code in any way. And there are major differences in user-side behavior:
- There is no requirement to have a configuration (i.e. project) named “default”.
- The default project is called “main”, and other projects do not inherit from it automatically.
- Not all fields need to be populated in
_targets.yaml
because thetargets
package already has system defaults.
However, you may wish to commit
_targets/meta/meta
, which is critical to checking the status of each target and reading targets into memory.↩︎