Chapter 2 Walkthrough

This chapter walks through a short example of a targets-powered data analysis project. The source code is available at https://github.com/wlandau/targets-four-minutes, and you can visit https://rstudio.cloud/project/3946303 to try out the code in a web browser (no download or installation required). The documentation website links to other examples. The contents of the chapter are also explained in a four-minute video tutorial:

2.1 About this example

The goal of this short analysis is to assess the relationship among ozone and temperature in base R’s airquality dataset. We track a data file, prepare a dataset, fit a model, and plot the model against the data.

2.2 File structure

The file structure of the project looks like this.

├── _targets.R
├── data.csv
└── R/
  └──── functions.R

data.csv contains the data we want to analyze.

Ozone,Solar.R,Wind,Temp,Month,Day
36,118,8.0,72,5,2
12,149,12.6,74,5,3
...

R/functions.R contains our custom user-defined functions. (See the functions chapter for a discussion of function-oriented workflows.)

# R/functions.R
get_data <- function(file) {
  read_csv(file, col_types = cols()) %>%
    filter(!is.na(Ozone))
}

fit_model <- function(data) {
  lm(Ozone ~ Temp, data) %>%
    coefficients()
}

plot_model <- function(model, data) {
  ggplot(data) +
    geom_point(aes(x = Temp, y = Ozone)) +
    geom_abline(intercept = model[1], slope = model[2])
}

2.3 Target script file

Whereas files data.csv and functions.R are typical user-defined components of a project-oriented workflow, the target script file _targets.R file is special. Every targets workflow needs a target script file to configure and define the pipeline.1 The use_targets() function in targets version >= 0.12.0 creates an initial target script for you to fill in. Ours looks like this:

# _targets.R file
library(targets)
source("R/functions.R")
tar_option_set(packages = c("readr", "dplyr", "ggplot2"))
list(
  tar_target(file, "data.csv", format = "file"),
  tar_target(data, get_data(file)),
  tar_target(model, fit_model(data)),
  tar_target(plot, plot_model(model, data))
)

All target script files have these requirements.

  1. Load the packages required for configuration, e.g. targets itself.2
  2. Load your custom functions and small input objects into the R session: in our case, with source("R/functions.R").
  3. Use tar_option_set() to declare the packages required for analysis, as well as other settings such as the default storage format.
  4. Write the pipeline at the bottom of _targets.R. A pipeline is a list of target objects, which you can create with tar_target(). Each target is a step of the analysis. It looks and feels like a variable in R, but it runs reproducibly and stores a value in _targets/objects/ when we run the pipeline with tar_make().
  5. Set additional arguments in tar_target() as needed. For example, format = "file" says that the target is an external file, and changes to the contents of the file will invalidate the target (i.e. cause it to rerun).3

2.4 Inspect the pipeline

Before you run the pipeline for real, it is best to check for obvious errors. tar_manifest() lists verbose information about each target.

tar_manifest(fields = all_of("command"))
#> # A tibble: 4 × 2
#>   name  command                  
#>   <chr> <chr>                    
#> 1 file  "\"data.csv\""           
#> 2 data  "get_data(file)"         
#> 3 model "fit_model(data)"        
#> 4 plot  "plot_model(model, data)"

tar_visnetwork() displays the dependency graph. It should show a natural progression of work from left to right as the pipeline progresses, and it targets and functions that depend on one another should have directed edges to identify the dependency relationships.4 Dependency relationships are automatically detected using static code analysis, and the order of tar_target() calls in the target list does not matter at all.

tar_visnetwork()

2.5 Run the pipeline

tar_make() runs the pipeline. It creates a reproducible new external R process which then reads the target script and runs the correct targets in the correct order.5

tar_make()
#> • start target file
#> • built target file
#> • start target data
#> • built target data
#> • start target model
#> • built target model
#> • start target plot
#> • built target plot
#> • end pipeline: 0.709 seconds

The output of the pipeline is saved to the _targets/ data store, and you can read the output with tar_read() (see also tar_load()).

tar_read(plot)

The next time you run tar_make(), targets skips everything that is already up to date, which saves a lot of time in large projects with long runtimes.

tar_make()
#> ✔ skip target file
#> ✔ skip target data
#> ✔ skip target model
#> ✔ skip target plot
#> ✔ skip pipeline: 0.077 seconds

You can use tar_visnetwork() and tar_outdated() to check ahead of time which targets are up to date.

tar_visnetwork()
tar_outdated()
#> character(0)

2.6 Changes

The targets package notices when you make changes to code and data, and those changes affect which targets rerun and which targets are skipped.6

2.6.1 Change code

If you change one of your functions, the targets that depend on it will no longer be up to date, and tar_make() will rebuild them. For example, let’s increase the font size of the plot.

# Edit functions.R...
plot_model <- function(model, data) {
  ggplot(data) +
    geom_point(aes(x = Temp, y = Ozone)) +
    geom_abline(intercept = model[1], slope = model[2]) +
    theme_gray(24) # Increased the font size.
}

targets detects the change. plot is “outdated” (i.e. invalidated) and the others are still up to date.

tar_visnetwork()
tar_outdated()
#> [1] "plot"

Thus, tar_make() reruns plot and nothing else.7

tar_make()
#> ✔ skip target file
#> ✔ skip target data
#> ✔ skip target model
#> • start target plot
#> • built target plot
#> • end pipeline: 0.523 seconds

Sure enough, we have a new plot.

tar_read(plot)

2.6.2 Change data

If we change the data file data.csv, targets notices the change. This is because file is a file target (i.e. with format = "file" in tar_target()), and the return value from last tar_make() identified "data.csv" as the file to be tracked for changes. Let’s try it out. Below, let’s use only the first 100 rows of the airquality dataset.

write_csv(head(airquality, n = 100), "data.csv")

Sure enough, raw_data_file and everything downstream is out of date, so all our targets are outdated.

tar_visnetwork()
tar_outdated()
#> [1] "file"  "plot"  "data"  "model"
tar_make()
#> • start target file
#> • built target file
#> • start target data
#> • built target data
#> • start target model
#> • built target model
#> • start target plot
#> • built target plot
#> • end pipeline: 0.711 seconds

2.7 Read metadata

To read the build progress of your targets while tar_make() is running, you can open a new R session and run tar_progress(). It reads the flat file in _targets/meta/progress and tells you which targets are running, built, errored, or cancelled.

tar_progress()
#> # A tibble: 4 × 2
#>   name  progress
#>   <chr> <chr>   
#> 1 file  built   
#> 2 data  built   
#> 3 model built   
#> 4 plot  built

Likewise, the tar_meta() function reads _targets/meta/meta and tells you high-level information about the target’s settings, data, and results. The warnings, error, and traceback columns give you diagnostic information about targets with problems.

tar_meta()
#> # A tibble: 7 × 18
#>   name  type  data  command depend    seed path  time                size  bytes
#>   <chr> <chr> <chr> <chr>   <chr>    <int> <lis> <dttm>              <chr> <int>
#> 1 data  stem  1fab… 4525b9… 60955…  1.59e9 <chr> 2022-05-10 20:07:21 d476…  1002
#> 2 file  stem  2b16… 0d4b75… ef46d… -1.30e9 <chr> 2022-05-10 20:07:17 9fb6…  1884
#> 3 fit_… func… 9826… <NA>    <NA>   NA      <chr> NA                  <NA>     NA
#> 4 get_… func… caaa… <NA>    <NA>   NA      <chr> NA                  <NA>     NA
#> 5 model stem  594b… f3249b… 8b951…  1.82e9 <chr> 2022-05-10 20:07:21 4bab…   108
#> 6 plot  stem  1c62… 1239c9… a7551…  1.93e9 <chr> 2022-05-10 20:07:22 e980… 40437
#> 7 plot… func… 53e9… <NA>    <NA>   NA      <chr> NA                  <NA>     NA
#> # … with 8 more variables: format <chr>, repository <chr>, iteration <chr>,
#> #   parent <lgl>, children <list>, seconds <dbl>, warnings <lgl>, error <lgl>

The _targets/meta/meta file is critically important. Although targets can still work properly if files are missing from _targets/objects, the pipeline will error out if _targets/meta/meta is corrupted. If tar_meta() works, the project should be fine.


  1. By default, the target script is a file called _targets.R in the project’s root directory. However, in targets version 0.5.0.9000 and above, you can set the target script file path to something other than _targets.R. You can either set the path persistently for your project using tar_config_set(), or you can set it temporarily for an individual function call using the script argument of tar_make() and related functions.↩︎

  2. target scripts created with tar_script() automatically insert a library(targets) line at the top by default.↩︎

  3. format = "file" allows you to track multiple files and directories.↩︎

  4. If you have hundreds of targets, then tar_visnetwork() may be slow. If that happens, consider temporarily commenting out some targets in _targets.R just for visualization purposes.↩︎

  5. In targets version 0.3.1.9000 and above, you can set the path of the local data store to something other than _targets/. A project-level _targets.yaml file keeps track of the path. Functions tar_config_set() and tar_config_get() can help.↩︎

  6. Internally, special rules called “cues” decide whether a target reruns. The tar_cue() function lets you suppress some of these cues, and the tarchetypes package supports nuanced cue factories and target factories to further customize target invalidation behavior. The tar_cue() function documentation explains cues in detail, as well as specifics on how targets detects changes to upstream dependencies.↩︎

  7. We would see similar behavior if we changed the R expressions in any tar_target() calls in the target script file.↩︎

Copyright Eli Lilly and Company