6  Targets

A target is a high-level steps of the computational pipeline, and it a piece of work that you define with your custom functions. A target runs some R code and saves the returned R object to storage, usually a single file inside _targets/objects/.

6.1 Target names

A target is an abstraction. The targets package automatically manages data storage and retrieval under the hood, which means you do not need to reference a target’s data file directly (e.g. _targets/objects/your_target_name). Instead, your R code should refer to a target name as if it were a variable in an R session. In other words, from the point of view of the user, a target is an R object in memory. That means a target name must be a valid visible symbol name for an R variable. The name must not begin with a dot, and it must be a string that lets you assign a value, e.g. your_target_name <- TRUE. For stylistic considerations, please refer to the tidyverse style guide syntax chapter.

6.2 What a target should do

Like a good function, a good target generally does one of three things:

  1. Create a dataset.
  2. Analyze a dataset with a model.
  3. Summarize an analysis or dataset.

If a function gets too long, you can split it into nested sub-functions that make your larger function easier to read and maintain.

6.3 How much a target should do

The targets package automatically skips targets that are already up to date, so it is best to define targets that maximize time savings. Good targets usually

  1. Are large enough to subtract a decent amount of runtime when skipped.
  2. Are small enough that some targets can be skipped even if others need to run.
  3. Invoke no side effects such as modifications to the global environment. (But targets with tar_target(format = "file") can save files.)
  4. Return a single value that is
    1. Easy to understand and introspect.
    2. Meaningful to the project.
    3. Easy to save as a file, e.g. with readRDS(). Please avoid non-exportable objects as target return values or global variables.

Regarding the last point above, it is possible to customize the storage format of the target. For details, enter ?tar_target in the console and scroll down to the description of the format argument.

6.4 Working with tools outside R

Each target runs R code, so to invoke a tool outside R, consider system2() or processx to call the appropriate system commands. This technique allows you to run shell scripts, Python scripts, etc. from within R. External scripts should ideally be tracked as input files using tar_target(format = "file") as described in section on external input files. There are also specialized R packages to retrieve data from remote sources and invoke web APIs, including rnoaa, ots, and aws.s3, and you may wish to use custom cues to automatically invalidate a target when the upstream remote data changes.

6.5 Side effects

Like a good pure function, a good target should return a single value and not produce side effects. (The exception is output file targets which create files and return their paths.) Avoid modifying the global environment with calls to data() or source(). If you need to source scripts to define global objects, please do so at the top of your target script file (default: _targets.R) just like source("R/functions.R") from the walkthrough vignette.

6.6 Dependencies

A dependency x of target y is a target, global object, or global function that y requires in order to run its R command. When target y is about to run, x is up to date and loaded into memory in the R session.1 Consequently, when targets x and y are not up to date, target x always finishes running before target y starts running. The targets package automatically discovers dependencies using static code analysis. In the example below, x is automatically detected as a dependency of y because the R command of y is the expression x + 1, which contains the symbol x. Consequently, the tar_visnetwork() dependency graph contains a left-to-right arrow from x to y.

# _targets.R file
library(targets)
source("R/functions.R")
list(
  tar_target(x, 2),
  tar_target(y, x + 1)
)
# R console
tar_visnetwork()

To force a dependency relationship, simply mention the dependency in the target’s command. In the following example, target y depends on target z even though z does not actually contribute to the return value of y. Target z will still finish before target y begins, and target y will still load target z into its memory space before running the R command.

# _targets.R file
library(targets)
source("R/functions.R")
list(
  tar_target(x, 2),
  tar_target(
    y, {
      z # Merely mentioned to force y to depend on z.
      x + 1
    }
  ),
  tar_target(z, 1)
)
# R console
tar_visnetwork()

The tar_deps() function shows you which dependencies will be detected in a given function or R command.2

tar_deps(x + 2)
#> [1] "+" "x"

tar_deps(command_of_y(dependency = x))
#> [1] "command_of_y" "x"

tar_deps(function() {
  read_csv(raw_data_file, col_types = cols())
})
#> [1] "{"             "cols"          "raw_data_file" "read_csv"

However, the findings from tar_deps() are only candidate dependencies. Unless they are either targets in the pipeline or global objects3, they will be ignored. For example, functions and objects defined in R packages are ignored. To force a pipeline to notice dependencies from an R package, include the name of the package in the imports and packages fields of tar_option_set().

6.7 Return value

The return value of a target should be an R object that can be saved to disk and hashed.

6.7.1 Saving

The object should be compatible with the storage format you choose using the format argument of tar_target() or tar_option_set(). For example, if the format is "rds" (default), then the target should return an R object that can be saved with saveRDS() and safely loaded properly into another session. Please avoid returning non-exportable objects such as connection objects, Rcpp pointers, xgboost matrices, and greta models4.

6.7.2 Hashing

Once a target is saved to disk, targets computes a digest hash to track changes to the data file(s). These hashes are used to decide whether each target is up to date or needs to rerun. In order for the hash to be useful, the data you return from a target must be an accurate reflection of the underlying content of the data. So please try to return the actual data instead of an object that wraps or points to the data. Otherwise, the package will make incorrect decisions regarding which targets can skip and which need to rerun.

6.7.3 Workaround

As a workaround, you can write custom functions to create temporary instances of these non-exportable/non-hashable objects and clean them up after the task is done. The following sketch creates a target that returns a database table while managing a transient connection object.

# _targets.R
library(targets)

get_from_database <- function(table, ...) {
  con <- DBI::dbConnect(...)
  on.exit(close(con))
  dbReadTable(con, table)
}

list(
  tar_target(
    table_from_database,
    get_from_database("my_table", ...), # ... has use-case-specific arguments.
    format = "feather" # Requires that the return value is a data frame.
  )
)

  1. targets automatically loads dependencies into memory when they are required, so it is rarely advisable to call tar_read() or tar_load() from inside a target. Except in rare circumstances, tar_read() and tar_load() are only for exploratory data analysis and literate programming.↩︎

  2. tar_deps() works mostly like the findGlobals() function from the codetools package, except the former makes special adjustments for odd cases like formulas.↩︎

  3. i.e. loaded into tar_option_get("envir) from within _targets.R↩︎

  4. Special exceptions are granted to Keras and Torch models, which can be safely returned from targets if you specify format = "keras" or format = "torch".↩︎