Chapter 16 Storage

16.1 drake’s cache

When you run make(), drake stores your targets in a hidden storage cache.

library(drake)
load_mtcars_example() # from https://github.com/wlandau/drake-examples/tree/main/mtcars
make(my_plan, verbose = 0L)

The default cache is a hidden .drake folder.

find_cache()
### [1] "/home/you/project/.drake"

drake’s loadd() and readd() functions load targets into memory.

loadd(large)
head(large)

head(readd(small))

16.2 Efficient target storage

drake supports custom formats for large and specialized targets. For example, the "fst" format uses the fst package to save data frames faster. Simply enclose the command and the format together with the target() function.

library(drake)
n <- 1e8 # Each target is 1.6 GB in memory.
plan <- drake_plan(
  data_fst = target(
    data.frame(x = runif(n), y = runif(n)),
    format = "fst"
  ),
  data_old = data.frame(x = runif(n), y = runif(n))
)
make(plan)
#> target data_fst
#> target data_old
build_times(type = "build")
#> # A tibble: 2 x 4
#>   target   elapsed              user                 system    
#>   <chr>    <Duration>           <Duration>           <Duration>
#> 1 data_fst 13.93s               37.562s              7.954s    
#> 2 data_old 184s (~3.07 minutes) 177s (~2.95 minutes) 4.157s

For more details and a complete list of formats, see https://books.ropensci.org/drake/plans.html#special-data-formats-for-targets.

16.3 Why is my cache so big?

16.3.1 Old targets

By default, drake holds on to all your targets from all your runs of make(). Even if you run clean(), the data stays in the cache in case you need to recover it.

clean()

make(my_plan, recover = TRUE)

If you really want to remove old historical values of targets, run drake_gc() or drake_cache()$gc().

drake_gc()

clean() also has a garbage_collection argument for this purpose. Here is a slick way to remove historical targets and targets no longer in your plan.

clean(list = cached_unplanned(my_plan), garbage_collection = TRUE)

16.3.2 Garbage from interrupted builds

If make() crashes or gets interrupted, old files can accumulate in .drake/scratch/ and .drake/drake/tmp/. As long as make() is no longer running, can safely remove the files in those folders (but keep the folders themselves).

16.4 Interfaces to the cache

drake uses the storr package to create and modify caches.

library(storr)
cache <- storr_rds(".drake")

head(cache$list())

head(cache$get("small"))

drake has its own interface on top of storr to make it easier to work with the default .drake/ cache. The loadd(), readd(), and cached() functions explore saved targets.

head(cached())

head(readd(small))

loadd(large)

head(large)

rm(large) # Does not remove `large` from the cache.

new_cache() create caches and drake_cache() recovers existing ones. (drake_cache() is only supported in drake version 7.4.0 and above.)

cache <- drake_cache()
cache$driver$path

cache <- drake_cache(path = ".drake") # File path to drake's cache.
cache$driver$path

You can supply your own cache to make() and friends (including specialized storr caches like storr_dbi()).

plan <- drake_plan(x = 1, y = sqrt(x))
make(plan, cache = cache)

vis_drake_graph(plan, cache = cache)

Destroy caches to remove them from your file system.

cache$destroy()

file.exists(".drake")
Copyright Eli Lilly and Company