Chapter 3 Target Markdown

Target Markdown, available in targets > 0.4.2.9000, is a powerful R Markdown interface for reproducible analysis pipelines. With Target Markdown, you can define a fully scalable pipeline from within one or more R Markdown reports, anything from a single report to a whole bookdown or workflowr project. You get the best of both worlds: the human readable narrative of literate programming, and the sophisticated caching and dependency management systems of targets.

3.1 Access

This chapter’s example Target Markdown document is itself a tutorial and a simplified version of the chapter. There are two convenient ways to access the file:

  1. The use_targets() function.
  2. The [RStudio R Markdown template system] https://rstudio.github.io/rstudio-extensions/rmarkdown_templates.html.

For (2), in the RStudio IDE, select a new R Markdown document in the New File dropdown menu in the upper left-hand corner of the window.

Then, select the Target Markdown template and click OK to open a copy of the report for editing.

3.2 Engine

Target Markdown uses a special knitr language engine. R Markdown code chunks begin with {targets} rather than {r}, and there are special chunk options:

  • tar_globals: Logical of length 1, whether to define globals or targets. If TRUE, the chunk code defines functions, objects, and options common to all the targets. If FALSE or NULL (default), then the chunk returns formal targets for the pipeline.
  • tar_interactive: Logical of length 1, whether to run in interactive mode or non-interactive mode. Defaults to the return value of interactive().
  • tar_name: name to use for writing helper script files (e.g. _targets_r/targets/target_script.R) and specifying target names if the tar_simple chunk option is TRUE. All helper scripts and target names must have unique names, so please do not set this option globally with knitr::opts_chunk$set().
  • tar_script: Character of length 1, where to write the target script file in non-interactive mode. Most users can skip this option and stick with the default _targets.R script path. Helper script files are always written next to the target script in a folder with an "_r" suffix. The tar_script path must either be absolute or be relative to the project root (where you call tar_make() or similar). If not specified, the target script path defaults to tar_config_get("script") (default: _targets.R; helpers default: _targets_r/). When you run tar_make() etc. with a non-default target script, you must select the correct target script file either with the script argument or with tar_config_set(script = ...). The function will source() the script file from the current working directory (i.e. with chdir = FALSE in source()).
  • tar_simple: Logical of length 1. Set to TRUE to define a single target with a simplified interface. In code chunks with tar_simple equal to TRUE, the chunk label (or the tar_name chunk option if you set it) becomes the name, and the chunk code becomes the command. In other words, a code chunk with label targetname and command mycommand() automatically gets converted to tar_target(name = targetname, command = mycommand()). All other arguments of tar_target() remain at their default values (configurable with tar_option_set() in a tar_globals = TRUE chunk).

3.3 Modes

The {targets} engine can run in interactive or non-interactive mode. Interactive mode is for prototyping and testing, while non-interactive mode is for pipeline construction. Pipeline development is continual back-and-forth between exploratory analysis and serious runs, so most users will frequently switch modes. The default mode is whatever interactive() returns, so the notebook interface defaults to interactive (https://bookdown.org/yihui/rmarkdown/notebook.html) and the Knit button in RStudio defaults to non-interactive. However, you can control the mode with the tar_interactive chunk option. You can even set the default mode for the whole report:

```{r}
knitr::opts_chunk$set(tar_interactive = FALSE)
```

The following example demonstrates both modes.

3.4 Example

The following example is based on the minimal targets project at https://github.com/wlandau/targets-minimal/. We process the base airquality dataset, fit a model, and display a histogram of ozone concentration.

3.5 Packages

This example requires several R packages, and targets must be version 0.5.0 or above.

# R console
install.packages(c("biglm", "dplyr", "ggplot2", "readr", "targets", "tidyr"))

3.6 Setup

First, load targets to activate the specialized knitr engine for Target Markdown.

```{r}
library(targets)
```

Early on, you may also wish to remove the leftover _targets_r directory from a previous run in non-interactive mode.

```{r}
tar_unscript()
```

3.7 Globals

As usual, your targets depend on custom functions, global objects, and tar_option_set() options you define before the pipeline begins. Define these globals using the {targets} engine with tar_globals = TRUE chunk option.

```{targets some-globals, tar_globals = TRUE, tar_interactive = TRUE}
options(tidyverse.quiet = TRUE)
tar_option_set(packages = c("biglm", "dplyr", "ggplot2", "readr", "tidyr"))
create_plot <- function(data) {
  ggplot(data) +
    geom_histogram(aes(x = Ozone), bins = 12) +
    theme_gray(24)
}
```

In interactive mode, the chunk simply runs the R code in the tar_option_get("envir") environment (usually the global environment) and displays a message:

#> Ran code and assigned objects to the environment.

Here is the same chunk in non-interactive mode. Normally, there is no need to duplicate chunks like this, but we do so here in order to demonstrate both modes.

```{targets chunk-name, tar_globals = TRUE, tar_interactive = FALSE}
options(tidyverse.quiet = TRUE)
tar_option_set(packages = c("biglm", "dplyr", "ggplot2", "readr", "tidyr"))
create_plot <- function(data) {
  ggplot(data) +
    geom_histogram(aes(x = Ozone), bins = 12) +
    theme_gray(24)
}
```

In non-interactive mode, the chunk establishes a common _targets.R file and writes the R code to a script in _targets_r/globals/, and displays an informative message:4

#> Established _targets.R and _targets_r/globals/chunk-name.R.

It is good practice to assign explicit chunk labels or set the tar_name chunk option on a chunk-by-chunk basis. Each chunk writes code to a script path that depends on the name, and all script paths need to be unique.5

3.8 Targets

To define targets for the workflow, use the targets engine with the tar_globals chunk option equal FALSE or NULL (default). The return value of the chunk must be a target object or a list of target objects, created by tar_target() or a similar function.

Below, we define a target to establish the air quality dataset in the pipeline.

```{targets raw-data, tar_interactive = TRUE}
tar_target(raw_data, airquality)
```

If you run this chunk in interactive mode, the target’s R command runs, the engine tests if the output can be saved and loaded from disk correctly, and then the return value gets assigned to the tar_option_get("envir") environment (usually the global environment).

#> Ran targets and assigned them to the environment.

In the process, some temporary files are created and destroyed, but your local file space will remain untouched (barring any custom side effects in your custom code).

After you run a target in interactive mode, the return value is available in memory, and you can write an ordinary R code chunk to read it.

```{r}
head(raw_data)
```

The output is the same as what tar_read(raw_data) would show after a serious pipeline run.

head(raw_data)
#>   Ozone Solar.R Wind Temp Month Day
#> 1    41     190  7.4   67     5   1
#> 2    36     118  8.0   72     5   2
#> 3    12     149 12.6   74     5   3
#> 4    18     313 11.5   62     5   4
#> 5    NA      NA 14.3   56     5   5
#> 6    28      NA 14.9   66     5   6

For demonstration purposes, here is the raw_data target code chunk in non-interactive mode.

```{targets chunk-name-with-target, tar_interactive = FALSE}
tar_target(raw_data, airquality)
```

In non-interactive mode, the {targets} engine does not actually run any targets. Instead, it establishes a common _targets.R and writes the code to a script in _targets_r/targets/.

#> Established _targets.R and _targets_r/targets/chunk-name-with-target.R.

Next, we define more targets to process the raw data and plot a histogram. Only the returned value of the chunk code actually becomes part of the pipeline, so if you define multiple targets in a single chunk, be sure to wrap them all in a list.

```{targets downstream-targets}
list(
  tar_target(data, raw_data %>% filter(!is.na(Ozone))),
  tar_target(hist, create_plot(data))
)
```

In non-interactive mode, the whole target list gets written to a single script.

#> Established _targets.R and _targets_r/targets/downstream-targets.R.

Lastly, we define a target to fit a model to the data. For simple targets like this one, we can use convenient shorthand to convert the code in a chunk into a valid target. Simply set the tar_simple chunk option to TRUE.

```{targets fit, tar_simple = TRUE}
analysis_data <- data
biglm(Ozone ~ Wind + Temp, analysis_data)
```

When the chunk is preprocessed, chunk label (or the tar_name chunk option if you set it) becomes the target name, and the chunk code becomes the target command. All other arguments of tar_target() remain at their default values (configurable with tar_option_set() in a tar_globals = TRUE chunk). The output in the rendered R Markdown document reflects this preprocessing.

tar_target(fit, {
  biglm(Ozone ~ Wind + Temp, data)
})
#> Defined target fit automatically from chunk code.
#> Established _targets.R and _targets_r/targets/fit.R.

3.9 Pipeline

If you ran all the {targets} chunks in non-interactive mode (i.e. pipeline construction mode), then the target script file and helper scripts should all be established, and you are ready to run the pipeline in with tar_make() in an ordinary {r} code chunk. This time, the output is written to persistent storage at the project root.

```{r}
tar_make()
```
#> • start target raw_data
#> • built target raw_data
#> • start target data
#> • built target data
#> • start target fit
#> • built target fit
#> • start target hist
#> • built target hist
#> • end pipeline

3.10 Output

You can retrieve results from the _targets/ data store using tar_read() or tar_load().

```{r}
library(biglm)
tar_read(fit)
```
#> Large data regression model: biglm(Ozone ~ Wind + Temp, data)
#> Sample size =  116
```{r}
tar_read(hist)
```

The targets dependency graph helps your readers understand the steps of your pipeline at a high level.

```{r}
tar_visnetwork()
```

At this point, you can go back and run {targets} chunks in interactive mode without interfering with the code or data of the non-interactive pipeline.


  1. The _targets.R file from Target Markdown never changes from chunk to chunk or report to report, so you can spread your work over multiple reports without worrying about aligning _targets.R scripts. Just be sure all your chunk names are unique across all the reports of a project, or you set the tar_name chunk option to specify base names of script file paths.↩︎

  2. In addition, for for bookdown projects, chunk labels should only use alphanumeric characters and dashes.↩︎

Copyright Eli Lilly and Company