::dir_tree("_targets")
fs
_targets
├── meta
│ ├── crew
│ ├── meta
│ ├── process
│ └── progress
├── objects
│ ├── target1
│ ├── target2
│ ├── dynamic_branch_c7bcb4bd
│ ├── dynamic_branch_285fb6a9
│ └── dynamic_branch_874ca381# tar_make() deletes this folder after it finishes.
├── scratch # for gittargets user data └── user
10 Local data
See the performance chapter for options, settings, and other choices to make storage and memory more efficient for large data workflows.
During a pipeline, targets
manages R objects in memory and writes them to files on disk. It also stores target-level metadata in a compact central text file.
10.1 Memory
targets
uses random access memory (RAM) while the pipeline is running. Each target loads its upstream dependencies into memory and returns an R object in memory. After a target runs or loads, tar_make()
either keeps the object in memory or discards it, depending on the settings in tar_target()
and tar_option_set()
. Set memory = "transient"
to release the target whenever possible, and set garbage_collection = TRUE
to run garbage collection with gc()
just before the target runs. Alternatively, set memory = "persistent"
to keep the object in memory and reduce costly interactions with the file system. The trade-off between memory and file I/O depends on your computing platform. See the performance chapter for more details.
10.2 Local data store
In addition to memory, the pipeline writes data to files on disk. tar_make()
creates a special data folder called _targets/
at the root of your project.
The two most important parts are:
_targets/meta/meta
: a text file with key target-level metadata._targets/objects/
: a folder with the output data of each target.
Consider this pipeline:
library(targets)
library(tarchetypes)
list(
tar_target(
name = target1,
command = 11 + 46,
format = "rds",
repository = "local"
) )
tar_make()
does the following:
- Run the command of
target1
and observe a return value of57
. - Save the value
57
to_targets/objects/target1
using saveRDS(). - Append a line to
_targets/meta/meta
containing the hash, time stamp, file size, warnings, errors, and execution time oftarget1
. - Append a line to
_targets/meta/progress
to indicate thattarget1
finished.
Remarks:
- To read the value of
target1
back into R,tar_read(target1)
is much better thanreadRDS("_targets/objects/target1")
. - The
format
argument oftar_target()
controls howtar_make()
saves the return value. The default is"rds"
, and there are more efficient formats such as"qs"
and"feather"
. Some of these formats require external packages. See https://docs.ropensci.org/targets/reference/tar_target.html#storage-formats for details. - For efficiency,
tar_make()
does not write to_targets/meta/meta
or_targets/meta/progress
every single time a target completes. Instead, it waits and gathers a backlog of text lines in memory, then writes whole batches of lines at a time. This behavior risks losing metadata in the event of a crash, but it reduces costly interactions with the file system. Theseconds_meta
argument controls how oftentar_make()
writes metadata.seconds_reporter
does the same for messages printed to the R console.
10.3 External files
Some pipelines work with custom external files outside _targets/
. The user is still responsible for reading and writing these files. However, the pipeline can track them, detect changes, and decide whether to rerun or skip the targets that the files depend on. Simply create a file target.
In a file target,
tar_target()
hasformat = "file"
.- The command returns a character vector of file paths.
Consider this pipeline:
# _targets.R
library(targets)
library(tarchetypes)
<- function(file) {
create_output <- read.csv(file)
data <- head(data)
output write.csv(output, "output.csv")
"output.csv"
}
list(
tar_target(name = input, command = "data.csv", format = "file"),
tar_target(name = output, command = create_output(input), format = "file")
)
In the dependency graph, output
depends on input
because the command of output
mentions the symbol input
.
tar_visnetwork()
Before the pipeline first runs, data.csv
exists, but output.csv
does not. During tar_make()
, the input
target tracks data.csv
, and the output
target creates and tracks output.csv
. If data.csv
changes before the next tar_make()
, then both input
and output
rerun. If something outside the pipeline changes output.csv
, then output
reruns.
Remarks:
- A file target can have both input and output files.
- A file target can include directory paths as well as individual file paths.
format = "file_fast"
is just likeformat = "file"
, except that it uses file modification time stamps to check if a file is up to date. If the time stamp of the file agrees with the time stamp in the metadata, the file is considered up to date. Otherwise, targets recomputes the hash of the file to make a final determination.
10.4 Clean up local files
There are multiple functions to remove or clean up local target storage. Some of them also delete cloud data if your pipeline uses an AWS or GCP bucket (see the next chapter).
tar_destroy()
removes_targets/
data store and any cloud data from the pipeline.tar_prune()
deletes data and metadata irrelevant to the current pipeline in_targets.R
.tar_delete()
deletes specific data files from_targets/objects/
and the cloud. It does not modify metadata.tar_invalidate()
removes metadata from specific targets but keeps their data files in_targets/objects/
.tar_meta_delete()
removes_targets/meta/
files and their copies on the cloud.