Chapter 6 Dynamic branching
6.1 A note about versions
The first release of dynamic branching was in drake
version 7.8.0. In subsequent versions, dynamic branching behaves differently. This manual describes how dynamic branching works in development drake
(to become version 7.9.0 in early January 2020). If you are using version 7.8.0, please refer to this version of the chapter instead.
6.2 Motivation
In large workflows, you may need more targets than you can easily type in a plan, and you may not be able to fully specify all targets in advance. Dynamic branching is an interface to declare new targets while make()
is running. It lets you create more compact plans and graphs, it is easier to use than static branching, and it improves the startup speed of make()
and friends.
6.3 Which kind of branching should I use?
With dynamic branching, make()
is faster to initialize, and you have far more flexibility. With static branching, you have meaningful target names, and it is easier to predict what the plan is going to do in advance. There is a ton of room for overlap and personal judgement, and you can even use both kinds of branching together.
6.4 Dynamic targets
A dynamic target is a vector of sub-targets. We let make()
figure out which sub-targets to create and how to aggregate them.
As an example, let’s fit a regression model to each continent in Gapminder data. To activate dynamic branching, use the dynamic
argument of target()
.
library(broom)
library(drake)
library(gapminder)
library(tidyverse)
# Split the Gapminder data by continent.
<- function() {
gapminder_continents %>%
gapminder mutate(gdpPercap = scale(gdpPercap)) %>%
split(f = .$continent)
}
# Fit a model to a continent.
<- function(continent_data) {
fit_model <- continent_data[[1]]
data %>%
data lm(formula = gdpPercap ~ year) %>%
tidy() %>%
mutate(continent = data$continent[1]) %>%
select(continent, term, statistic, p.value)
}
<- drake_plan(
plan continents = gapminder_continents(),
model = target(fit_model(continents), dynamic = map(continents))
)
make(plan)
#> ▶ target continents
#> ▶ dynamic model
#> ❯ subtarget model_c56e5407
#> ❯ subtarget model_706a1529
#> ❯ subtarget model_da843806
#> ❯ subtarget model_862f8003
#> ❯ subtarget model_ebb41f51
#> ■ finalize model
The data type of every sub-target is the same as the dynamic target it belongs to. In other words, model
and model_23022788
are both data frames, and readd(model)
and friends automatically concatenate all the model_*
sub-targets.
readd(model)
#> # A tibble: 10 x 4
#> continent term statistic p.value
#> <fct> <chr> <dbl> <dbl>
#> 1 Africa (Intercept) -4.44 1.08e- 5
#> 2 Africa year 4.04 5.90e- 5
#> 3 Americas (Intercept) -5.56 6.10e- 8
#> 4 Americas year 5.55 6.16e- 8
#> 5 Asia (Intercept) -2.74 6.39e- 3
#> 6 Asia year 2.75 6.23e- 3
#> 7 Europe (Intercept) -14.4 3.12e-37
#> 8 Europe year 14.5 7.06e-38
#> 9 Oceania (Intercept) -11.3 1.32e-10
#> 10 Oceania year 11.5 9.48e-11
This behavior is powered by the vctrs
. A dynamic target like model
above is really a “vctr
” of sub-targets. Under the hood, the aggregated value of model
is what you get from calling vec_c()
on all the model_*
sub-targets. When you dynamically map()
over a non-dynamic object, you are taking slices with vec_slice()
. (When you map()
over a dynamic target, each element is a sub-target and vec_slice()
is not necessary.)
library(vctrs)
#>
#> Attaching package: 'vctrs'
#> The following object is masked from 'package:tibble':
#>
#> data_frame
#> The following object is masked from 'package:dplyr':
#>
#> data_frame
# same as readd(model)
<- subtargets(model)
s vec_c(
readd(s[1], character_only = TRUE),
readd(s[2], character_only = TRUE),
readd(s[3], character_only = TRUE),
readd(s[4], character_only = TRUE),
readd(s[5], character_only = TRUE)
)#> # A tibble: 10 x 4
#> continent term statistic p.value
#> <fct> <chr> <dbl> <dbl>
#> 1 Africa (Intercept) -4.44 1.08e- 5
#> 2 Africa year 4.04 5.90e- 5
#> 3 Americas (Intercept) -5.56 6.10e- 8
#> 4 Americas year 5.55 6.16e- 8
#> 5 Asia (Intercept) -2.74 6.39e- 3
#> 6 Asia year 2.75 6.23e- 3
#> 7 Europe (Intercept) -14.4 3.12e-37
#> 8 Europe year 14.5 7.06e-38
#> 9 Oceania (Intercept) -11.3 1.32e-10
#> 10 Oceania year 11.5 9.48e-11
loadd(model)
# Second slice if you were to map() over mtcars.
vec_slice(mtcars, 2)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
# Fifth slice if you were to map() over letters.
vec_slice(letters, 5)
#> [1] "e"
You can use vec_c()
and vec_slice()
to anticipate edge cases in dynamic branching.
# If you map() over a list, each sub-target is a single-element list.
vec_slice(list(1, 2), 1)
#> [[1]]
#> [1] 1
# If each sub-target has multiple elements,
# the aggregated target (e.g. from readd())
# will have more elements than sub-targets.
<- c(1, 2)
subtarget1 <- c(3, 4)
subtarget2 vec_c(subtarget1, subtarget2)
#> [1] 1 2 3 4
Back in our plan, target(fit_model(continents), dynamic = map(continents))
is equivalent to commands fit_model(continents[1])
through fit_model(continents[5])
. Since continents
is really a list of data frames, continents[1]
through continents[5]
are also lists of data frames, which is why we need the line data <- continent_data[[1]]
in fit_model()
.
To post-process our models, we can work with either the individual sub-targets or the whole vector of all the models. Below, year
uses the former and intercept
uses the latter.
<- drake_plan(
plan continents = gapminder_continents(),
model = target(fit_model(continents), dynamic = map(continents)),
# Filter each model individually:
year = target(filter(model, term == "year"), dynamic = map(model)),
# Aggregate all the models, then filter the whole vector:
intercept = filter(model, term != "year")
)
make(plan)
#> ℹ unloading 1 targets from environment
#> ▶ target intercept
#> ▶ dynamic year
#> ❯ subtarget year_20cb8ecb
#> ❯ subtarget year_f7502c3e
#> ❯ subtarget year_a22d53f2
#> ❯ subtarget year_1facb02b
#> ❯ subtarget year_399fff25
#> ■ finalize year
readd(year)
#> # A tibble: 5 x 4
#> continent term statistic p.value
#> <fct> <chr> <dbl> <dbl>
#> 1 Africa year 4.04 5.90e- 5
#> 2 Americas year 5.55 6.16e- 8
#> 3 Asia year 2.75 6.23e- 3
#> 4 Europe year 14.5 7.06e-38
#> 5 Oceania year 11.5 9.48e-11
readd(intercept)
#> # A tibble: 5 x 4
#> continent term statistic p.value
#> <fct> <chr> <dbl> <dbl>
#> 1 Africa (Intercept) -4.44 1.08e- 5
#> 2 Americas (Intercept) -5.56 6.10e- 8
#> 3 Asia (Intercept) -2.74 6.39e- 3
#> 4 Europe (Intercept) -14.4 3.12e-37
#> 5 Oceania (Intercept) -11.3 1.32e-10
If automatic concatenation of sub-targets is confusing (e.g. if some sub-targets are NULL
, as in https://github.com/ropensci-books/drake/issues/142) you can read the dynamic target as a named list (only in drake
version 7.10.0 and above).
readd(model, subtarget_list = TRUE) # Requires drake >= 7.10.0.
#> $model_c56e5407
#> # A tibble: 2 x 4
#> continent term statistic p.value
#> <fct> <chr> <dbl> <dbl>
#> 1 Africa (Intercept) -4.44 0.0000108
#> 2 Africa year 4.04 0.0000590
#>
#> $model_706a1529
#> # A tibble: 2 x 4
#> continent term statistic p.value
#> <fct> <chr> <dbl> <dbl>
#> 1 Americas (Intercept) -5.56 0.0000000610
#> 2 Americas year 5.55 0.0000000616
#>
#> $model_da843806
#> # A tibble: 2 x 4
#> continent term statistic p.value
#> <fct> <chr> <dbl> <dbl>
#> 1 Asia (Intercept) -2.74 0.00639
#> 2 Asia year 2.75 0.00623
#>
#> $model_862f8003
#> # A tibble: 2 x 4
#> continent term statistic p.value
#> <fct> <chr> <dbl> <dbl>
#> 1 Europe (Intercept) -14.4 3.12e-37
#> 2 Europe year 14.5 7.06e-38
#>
#> $model_ebb41f51
#> # A tibble: 2 x 4
#> continent term statistic p.value
#> <fct> <chr> <dbl> <dbl>
#> 1 Oceania (Intercept) -11.3 1.32e-10
#> 2 Oceania year 11.5 9.48e-11
Alternatively, you can identify an individual sub-target by its index.
subtargets(model)
#> [1] "model_c56e5407" "model_706a1529" "model_da843806" "model_862f8003"
#> [5] "model_ebb41f51"
readd(model, subtargets = 2) # equivalent to readd() on a single model_* sub-target
#> # A tibble: 2 x 4
#> continent term statistic p.value
#> <fct> <chr> <dbl> <dbl>
#> 1 Americas (Intercept) -5.56 0.0000000610
#> 2 Americas year 5.55 0.0000000616
If you don’t know the index offhand, you can find out using the sub-target’s name.
print(subtarget)
#> [1] "model_706a1529"
which(subtarget == subtargets(model))
#> [1] 2
If the sub-target errored out and subtargets()
fails, the individual sub-target metadata will have a subtarget_index
field.
diagnose(subtarget, character_only = TRUE)$subtarget_index
#> [1] 2
Either way, once you have the sub-target’s index, you can retrieve the section of data that the sub-target took as input. Below, we load the part of contenents
that the second sub-target of model
used during make()
.
::vec_slice(readd(continents), 2)
vctrs#> $Americas
#> # A tibble: 300 x 6
#> country continent year lifeExp pop gdpPercap[,1]
#> <fct> <fct> <int> <dbl> <int> <dbl>
#> 1 Argentina Americas 1952 62.5 17876956 -0.132
#> 2 Argentina Americas 1957 64.4 19610538 -0.0364
#> 3 Argentina Americas 1962 65.1 21283783 -0.00833
#> 4 Argentina Americas 1967 65.6 22934225 0.0850
#> 5 Argentina Americas 1972 67.1 24779799 0.226
#> 6 Argentina Americas 1977 68.5 26983828 0.291
#> 7 Argentina Americas 1982 69.9 29341374 0.181
#> 8 Argentina Americas 1987 70.8 31620918 0.195
#> 9 Argentina Americas 1992 71.9 33958947 0.212
#> 10 Argentina Americas 1997 73.3 36203463 0.381
#> # … with 290 more rows
If continents
were dynamic, we could have just used readd(continents, subtargets = 2)
. But continents
was a static target, so we needed to replicate drake
’s dynamic branching behavior using vctrs
.
6.5 Dynamic transformations
Dynamic branching supports transformations map()
, cross()
, and group()
. These transformations tell drake
how to create sub-targets.
6.5.1 map()
map()
iterates over the vector slices of the targets you supply as arguments. We saw above how map()
iterates over lists. If you give it a data frame, it will map over the rows.
<- drake_plan(
plan subset = head(gapminder),
row = target(subset, dynamic = map(subset))
)
make(plan)
#> ▶ target subset
#> ▶ dynamic row
#> ❯ subtarget row_9939cae3
#> ❯ subtarget row_e8047114
#> ❯ subtarget row_2ef3db10
#> ❯ subtarget row_f9171bbe
#> ❯ subtarget row_7d6002e9
#> ❯ subtarget row_509468b3
#> ■ finalize row
readd(row_9939cae3)
#> # A tibble: 1 x 6
#> country continent year lifeExp pop gdpPercap
#> <fct> <fct> <int> <dbl> <int> <dbl>
#> 1 Afghanistan Asia 1952 28.8 8425333 779.
If you supply multiple targets, map()
iterates over the slices of each.
<- drake_plan(
plan numbers = seq_len(2),
letters = c("a", "b"),
zipped = target(paste0(numbers, letters), dynamic = map(numbers, letters))
)
make(plan)
#> ▶ target numbers
#> ▶ target letters
#> ▶ dynamic zipped
#> ❯ subtarget zipped_8ac3968c
#> ❯ subtarget zipped_4a7a9b07
#> ■ finalize zipped
readd(zipped)
#> [1] "1a" "2b"
6.5.2 cross()
cross()
creates a new sub-target for each combination of targets you supply as arguments.
<- drake_plan(
plan numbers = seq_len(2),
letters = c("a", "b"),
combo = target(paste0(numbers, letters), dynamic = cross(numbers, letters))
)
make(plan)
#> ▶ dynamic combo
#> ❯ subtarget combo_8ac3968c
#> ❯ subtarget combo_ed1d2e7b
#> ❯ subtarget combo_ef37ab56
#> ❯ subtarget combo_4a7a9b07
#> ■ finalize combo
readd(combo)
#> [1] "1a" "1b" "2a" "2b"
6.5.3 group()
With group()
, you can create multiple aggregates of a given target. Use the .by
argument to set a grouping variable.
<- drake_plan(
plan data = gapminder,
by = data$continent,
gdp = target(
tibble(median = median(data$gdpPercap), continent = by[1]),
dynamic = group(data, .by = by)
)
)
make(plan)
#> ▶ target data
#> ▶ target by
#> ▶ dynamic gdp
#> ❯ subtarget gdp_9adfc39f
#> ❯ subtarget gdp_d9f30951
#> ❯ subtarget gdp_958a2f81
#> ❯ subtarget gdp_962b03c8
#> ❯ subtarget gdp_dc1cff81
#> ■ finalize gdp
readd(gdp)
#> # A tibble: 5 x 2
#> median continent
#> <dbl> <fct>
#> 1 2647. Asia
#> 2 12082. Europe
#> 3 1192. Africa
#> 4 5466. Americas
#> 5 17983. Oceania
6.6 Trace
All dynamic transforms have a .trace
argument to record optional metadata for each sub-target. In the example from group()
, the trace is another way to keep track of the continent of each median GDP value.
<- drake_plan(
plan data = gapminder,
by = data$continent,
gdp = target(
median(data$gdpPercap),
dynamic = group(data, .by = by, .trace = by)
)
)
make(plan)
#> ▶ dynamic gdp
#> ❯ subtarget gdp_7e88fb1c
#> ❯ subtarget gdp_a61b8e1b
#> ❯ subtarget gdp_278ff532
#> ❯ subtarget gdp_6f3facea
#> ❯ subtarget gdp_73037e69
#> ■ finalize gdp
The gdp
target no longer contains any explicit reference to continent.
readd(gdp)
#> [1] 2646.787 12081.749 1192.138 5465.510 17983.304
However, we can look up the continents in the trace.
read_trace("by", gdp)
#> [1] Asia Europe Africa Americas Oceania
#> Levels: Africa Americas Asia Europe Oceania
6.7 max_expand
Suppose we want a model for each country.
<- function() {
gapminder_countries %>%
gapminder mutate(gdpPercap = scale(gdpPercap)) %>%
split(f = .$country)
}
<- drake_plan(
plan countries = gapminder_countries(),
model = target(fit_model(countries), dynamic = map(countries))
)
The Gapminder dataset has 142 countries, which can get overwhelming. In the early stages of the workflow when we are still debugging and testing, we can limit the number of sub-targets using the max_expand
argument of make()
.
make(plan, max_expand = 2)
#> ▶ target countries
#> ▶ dynamic model
#> ❯ subtarget model_ab009698
#> ❯ subtarget model_cc031a6d
#> ■ finalize model
readd(model)
#> # A tibble: 4 x 4
#> continent term statistic p.value
#> <fct> <chr> <dbl> <dbl>
#> 1 Asia (Intercept) -1.48 0.170
#> 2 Asia year -0.233 0.821
#> 3 Europe (Intercept) -4.76 0.000773
#> 4 Europe year 4.59 0.000998
Then, when we are confident and ready, we can scale up to the full number of models.
make(plan)