fs::dir_tree("_targets")
_targets
├── meta
│ ├── crew
│ ├── meta
│ ├── process
│ └── progress
├── objects
│ ├── target1
│ ├── target2
│ ├── dynamic_branch_c7bcb4bd
│ ├── dynamic_branch_285fb6a9
│ └── dynamic_branch_874ca381
├── scratch # tar_make() deletes this folder after it finishes.
└── user # for gittargets user data10 Local data
See the performance chapter for options, settings, and other choices to make storage and memory more efficient for large data workflows.
During a pipeline, targets manages R objects in memory and writes them to files on disk. It also stores target-level metadata in a compact central text file.
10.1 Memory
In addition to persistent storage on disk, targets uses random access memory (RAM) while the pipeline is running. Each target loads its upstream dependencies into memory and returns an R object in memory. After a target runs or loads, tar_make() either keeps the object in memory or discards it, depending on the settings in tar_target() and tar_option_set(). Set memory = "transient" to release the target whenever possible. Alternatively, set memory = "persistent" to keep the object in memory and reduce costly interactions with the file system. The trade-off between memory and file I/O depends on your computing platform. See the performance chapter for more details.
10.2 Local data store
In addition to memory, the pipeline writes data to files on disk. tar_make() creates a special data folder called _targets/ at the root of your project.
The two most important parts are:
_targets/meta/meta: a text file with key target-level metadata._targets/objects/: a folder with the output data of each target.
Consider this pipeline:
library(targets)
library(tarchetypes)
list(
tar_target(
name = target1,
command = 11 + 46,
format = "rds",
repository = "local"
)
)tar_make() does the following:
- Run the command of
target1and observe a return value of57. - Save the value
57to_targets/objects/target1using saveRDS(). - Append a line to
_targets/meta/metacontaining the hash, time stamp, file size, warnings, errors, and execution time oftarget1. - Append a line to
_targets/meta/progressto indicate thattarget1finished.
Remarks:
- To read the value of
target1back into R,tar_read(target1)is much better thanreadRDS("_targets/objects/target1"). - The
formatargument oftar_target()controls howtar_make()saves the return value. The default is"rds", and there are more efficient formats such as"qs"and"feather". Some of these formats require external packages. See https://docs.ropensci.org/targets/reference/tar_target.html#storage-formats for details. - For efficiency,
tar_make()does not write to_targets/meta/metaor_targets/meta/progressevery single time a target completes. Instead, it waits and gathers a backlog of text lines in memory, then writes whole batches of lines at a time. This behavior risks losing metadata in the event of a crash, but it reduces costly interactions with the file system. Theseconds_metaargument controls how oftentar_make()writes metadata.seconds_reporterdoes the same for messages printed to the R console.
10.3 External files
Some pipelines work with custom external files outside _targets/. The user is still responsible for reading and writing these files. However, the pipeline can track them, detect changes, and decide whether to rerun or skip the targets that the files depend on. Simply create a file target.
In a file target,
tar_target()hasformat = "file".- The command returns a character vector of file paths.
Consider this pipeline:
# _targets.R
library(targets)
library(tarchetypes)
create_output <- function(file) {
data <- read.csv(file)
output <- head(data)
write.csv(output, "output.csv")
"output.csv"
}
list(
tar_target(name = input, command = "data.csv", format = "file"),
tar_target(name = output, command = create_output(input), format = "file")
)In the dependency graph, output depends on input because the command of output mentions the symbol input.
tar_visnetwork()Before the pipeline first runs, data.csv exists, but output.csv does not. During tar_make(), the input target tracks data.csv, and the output target creates and tracks output.csv. If data.csv changes before the next tar_make(), then both input and output rerun. If something outside the pipeline changes output.csv, then output reruns.
Remarks:
- A file target can have both input and output files.
- A file target can include directory paths as well as individual file paths.
10.4 Clean up local files
There are multiple functions to remove or clean up local target storage. Some of them also delete cloud data if your pipeline uses an AWS or GCP bucket (see the next chapter).
tar_destroy()removes_targets/data store and any cloud data from the pipeline.tar_prune()deletes data and metadata irrelevant to the current pipeline in_targets.R.tar_delete()deletes specific data files from_targets/objects/and the cloud. It does not modify metadata.tar_invalidate()removes metadata from specific targets but keeps their data files in_targets/objects/.tar_meta_delete()removes_targets/meta/files and their copies on the cloud.