Chapter 16 Storage
16.1 drake
’s cache
When you run make()
, drake
stores your targets in a hidden storage cache.
library(drake)
load_mtcars_example() # from https://github.com/wlandau/drake-examples/tree/main/mtcars
make(my_plan, verbose = 0L)
The default cache is a hidden .drake
folder.
find_cache()
### [1] "/home/you/project/.drake"
drake
’s loadd()
and readd()
functions load targets into memory.
loadd(large)
head(large)
head(readd(small))
16.2 Efficient target storage
drake
supports custom formats for large and specialized targets. For example, the "fst"
format uses the fst
package to save data frames faster. Simply enclose the command and the format together with the target()
function.
library(drake)
<- 1e8 # Each target is 1.6 GB in memory.
n <- drake_plan(
plan data_fst = target(
data.frame(x = runif(n), y = runif(n)),
format = "fst"
),data_old = data.frame(x = runif(n), y = runif(n))
)make(plan)
#> target data_fst
#> target data_old
build_times(type = "build")
#> # A tibble: 2 x 4
#> target elapsed user system
#> <chr> <Duration> <Duration> <Duration>
#> 1 data_fst 13.93s 37.562s 7.954s
#> 2 data_old 184s (~3.07 minutes) 177s (~2.95 minutes) 4.157s
For more details and a complete list of formats, see https://books.ropensci.org/drake/plans.html#special-data-formats-for-targets.
16.3 Why is my cache so big?
16.3.1 Old targets
By default, drake
holds on to all your targets from all your runs of make()
. Even if you run clean()
, the data stays in the cache in case you need to recover it.
clean()
make(my_plan, recover = TRUE)
If you really want to remove old historical values of targets, run drake_gc()
or drake_cache()$gc()
.
drake_gc()
clean()
also has a garbage_collection
argument for this purpose. Here is a slick way to remove historical targets and targets no longer in your plan.
clean(list = cached_unplanned(my_plan), garbage_collection = TRUE)
16.3.2 Garbage from interrupted builds
If make()
crashes or gets interrupted, old files can accumulate in .drake/scratch/
and .drake/drake/tmp/
. As long as make()
is no longer running, can safely remove the files in those folders (but keep the folders themselves).
16.4 Interfaces to the cache
drake
uses the storr package to create and modify caches.
library(storr)
<- storr_rds(".drake")
cache
head(cache$list())
head(cache$get("small"))
drake
has its own interface on top of storr to make it easier to work with the default .drake/
cache. The loadd()
, readd()
, and cached()
functions explore saved targets.
head(cached())
head(readd(small))
loadd(large)
head(large)
rm(large) # Does not remove `large` from the cache.
new_cache()
create caches and drake_cache()
recovers existing ones. (drake_cache()
is only supported in drake
version 7.4.0 and above.)
<- drake_cache()
cache $driver$path
cache
<- drake_cache(path = ".drake") # File path to drake's cache.
cache $driver$path cache
You can supply your own cache to make()
and friends (including specialized storr
caches like storr_dbi()
).
<- drake_plan(x = 1, y = sqrt(x))
plan make(plan, cache = cache)
vis_drake_graph(plan, cache = cache)
Destroy caches to remove them from your file system.
$destroy()
cache
file.exists(".drake")