Chapter 1 Introduction

The targets package is a Make-like pipeline toolkit for Statistics and data science in R. With targets, you can maintain a reproducible workflow without repeating yourself. targets learns how your pipeline fits together, skips costly runtime for tasks that are already up to date, runs only the necessary computation, supports implicit parallel computing, abstracts files as R objects, and shows tangible evidence that the results match the underlying code and data.

1.1 Motivation

Data analysis can be slow. A round of scientific computation can take several minutes, hours, or even days to complete. After it finishes, if you update your code or data, your hard-earned results may no longer be valid. Unchecked, this invalidation creates chronic Sisyphean loop:

  1. Launch the code.
  2. Wait while it runs.
  3. Discover an issue.
  4. Restart from scratch.

1.2 Pipeline toolkits

Pipeline toolkits like GNU Make break the cycle. They watch the dependency graph of the whole workflow and skip steps, or “targets”, whose code, data, and upstream dependencies have not changed since the last run of the pipeline. When all targets are up to date, this is evidence that the results match the underlying code and data, which helps us trust the results and confirm the computation is reproducible.

1.3 The targets package

Unlike most pipeline toolkits, which are language agnostic or Python-focused, the targets package allows data scientists and researchers to work entirely within R. targets implicitly nudges users toward a clean, function-oriented programming style that fits the intent of the R language and helps practitioners maintain their data analysis projects.

1.4 About this manual

This manual is a step-by-step user guide to targets. It walks through basic usage, outlines general best practices, dives deep into advanced features like high-performance computing, and helps drake users transition to targets. See the documentation website for most other major resources, including installation instructions, links to example projects, and a reference page with all user-side functions.

1.5 What about drake?

The drake is an older R-focused pipeline toolkit, and targets is drake’s long-term successor. There is a special chapter to explain why targets was created, what this means for drake’s future, advice for drake users transitioning to the targets, and the main technical advantages of targets over drake.

Copyright Eli Lilly and Company