# _targets.R file:
library(targets)
library(tarchetypes)
library(tibble)
<- tibble(
values method_function = rlang::syms(c("method1", "method2")),
data_source = c("NIH", "NIAID")
)<- tar_map(
targets values = values,
tar_target(analysis, method_function(data_source, reps = 10)),
tar_target(summary, summarize_analysis(analysis, data_source))
)list(targets)
16 Static branching
16.1 Branching
Branched pipelines can be computationally demanding. See the performance chapter for options, settings, and other choices to optimize and monitor large pipelines.
Sometimes, a pipeline contains more targets than a user can comfortably type by hand. For projects with hundreds of targets, branching can make the _targets.R
file more concise and easier to read and maintain.
targets
supports two types of branching: dynamic branching and static branching. Some projects are better suited to dynamic branching, while others benefit more from static branching or a combination of both. Here is a short list of tradeoffs.
Dynamic | Static |
---|---|
Pipeline creates new targets at runtime. | All targets defined in advance. |
Cryptic target names. | Friendly target names. |
Scales to hundreds of branches. | Does not scale as easily for tar_visnetwork() etc. |
No metaprogramming required. | Familiarity with metaprogramming is helpful. |
16.2 When to use static branching
Static branching is the act of defining a group of targets in bulk before the pipeline starts. Whereas dynamic branching uses last-minute dependency data to define the branches, static branching uses metaprogramming to modify the code of the pipeline up front. Whereas dynamic branching excels at creating a large number of very similar targets, static branching is most useful for smaller number of heterogeneous targets. Some users find it more convenient because they can use tar_manifest()
and tar_visnetwork()
to check the correctness of static branching before launching the pipeline.
16.3 Map
tar_map()
from the tarchetypes
package creates copies of existing target objects, where each new command is a variation on the original. In the example below, we have a data analysis workflow that iterates over datasets and analysis methods. The values
data frame has the operational parameters of each data analysis, and tar_map()
creates one new target per row.
tar_manifest()
#> # A tibble: 4 × 3
#> name command description
#> <chr> <chr> <chr>
#> 1 analysis_method2_NIAID "method2(\"NIAID\", reps = 10)" method2 NI…
#> 2 analysis_method1_NIH "method1(\"NIH\", reps = 10)" method1 NIH
#> 3 summary_method2_NIAID "summarize_analysis(analysis_method2_NIAID… method2 NI…
#> 4 summary_method1_NIH "summarize_analysis(analysis_method1_NIH, … method1 NIH
tar_visnetwork(targets_only = TRUE)
For shorter target names, use the names
argument of tar_map()
. And for more combinations of settings, use tidyr::expand_grid()
on values
.
# _targets.R file:
library(targets)
library(tarchetypes)
library(tidyr)
<- expand_grid( # Use all possible combinations of input settings.
values method_function = rlang::syms(c("method1", "method2")),
data_source = c("NIH", "NIAID")
)<- tar_map(
targets values = values,
names = "data_source", # Select columns from `values` for target names.
tar_target(analysis, method_function(data_source, reps = 10)),
tar_target(summary, summarize_analysis(analysis, data_source))
)list(targets)
It is extra important to run tar_manifest()
to check that tar_map()
generates the right R code for the targets. Sometimes, the metaprogramming may not produce the desired commands on your first try.
tar_manifest()
#> # A tibble: 8 × 3
#> name command description
#> <chr> <chr> <chr>
#> 1 analysis_NIAID_1 "method2(\"NIAID\", reps = 10)" method2 NI…
#> 2 analysis_NIAID "method1(\"NIAID\", reps = 10)" method1 NI…
#> 3 analysis_NIH_1 "method2(\"NIH\", reps = 10)" method2 NIH
#> 4 analysis_NIH "method1(\"NIH\", reps = 10)" method1 NIH
#> 5 summary_NIAID_1 "summarize_analysis(analysis_NIAID_1, \"NIAID\")" method2 NI…
#> 6 summary_NIAID "summarize_analysis(analysis_NIAID, \"NIAID\")" method1 NI…
#> 7 summary_NIH_1 "summarize_analysis(analysis_NIH_1, \"NIH\")" method2 NIH
#> 8 summary_NIH "summarize_analysis(analysis_NIH, \"NIH\")" method1 NIH
And of course, check the dependency graph to ensure the pipeline is properly connected. If tar_map()
generates a lot of targets, the graph may render slowly or look too cumbersome. If that happens, choose a small subset of rows of values
for tar_map()
and then try again on the smaller pipeline.
# You may need to zoom out on this interactive graph to see all 8 targets.
tar_visnetwork(targets_only = TRUE)
16.3.1 Limitations
tar_map()
generates R expressions to serve as commands in other targets. When it substitutes an element from values
, it needs a way to transform the element into valid R code. For elements even a little bit complicated, especially nested data frames and objects with attributes, this is not always possible. For these complicated elements, it is best to use quote()
to work with the underlying expressions instead of the objects themselves. See https://github.com/ropensci/tarchetypes/discussions/105 for an example.
16.4 Dynamic-within-static branching
You can even combine together static and dynamic branching. The static tar_map()
is an excellent outer layer on top of targets with patterns. The following is a sketch of a pipeline that runs each of two data analysis methods 10 times, once per random seed. Static branching iterates over the method functions, while dynamic branching iterates over the seeds. tar_map()
creates new patterns as well as new commands. So below, the summary methods map over the analysis methods both statically and dynamically.
# _targets.R file:
library(targets)
library(tarchetypes)
library(tibble)
<- tar_target(random_seed, seq_len(10))
random_seed_target <- tar_map(
targets values = tibble(method_function = rlang::syms(c("method1", "method2"))),
tar_target(
analysis,method_function("NIH", seed = random_seed),
pattern = map(random_seed)
),tar_target(
summary,summarize_analysis(analysis),
pattern = map(analysis)
)
)list(random_seed_target, targets)
tar_manifest()
#> # A tibble: 5 × 4
#> name command pattern description
#> <chr> <chr> <chr> <chr>
#> 1 random_seed "seq_len(10)" <NA> <NA>
#> 2 analysis_method1 "method1(\"NIH\", seed = random_seed)" map(rando… method1
#> 3 analysis_method2 "method2(\"NIH\", seed = random_seed)" map(rando… method2
#> 4 summary_method1 "summarize_analysis(analysis_method1)" map(analy… method1
#> 5 summary_method2 "summarize_analysis(analysis_method2)" map(analy… method2
tar_visnetwork(targets_only = TRUE)
16.5 Combine
tar_combine()
from the tarchetypes
package creates a new target to aggregate the results of upstream targets. In the simple example below, our combined target simply aggregates the rows returned from two other targets.
# _targets.R file:
library(targets)
library(tarchetypes)
library(tibble)
options(crayon.enabled = FALSE)
<- tar_target(head, head(mtcars, 1))
target1 <- tar_target(tail, tail(mtcars, 1))
target2 <- tar_combine(combined_target, target1, target2)
target3 list(target1, target2, target3)
tar_manifest()
#> # A tibble: 3 × 2
#> name command
#> <chr> <chr>
#> 1 head_mtcars head(mtcars, 1)
#> 2 tail_mtcars tail(mtcars, 1)
#> 3 combined_target vctrs::vec_c(head_mtcars = head_mtcars, tail_mtcars = tail_mt…
tar_visnetwork(targets_only = TRUE)
tar_make()
#> ▶ dispatched target head_mtcars
#> ● completed target head_mtcars [0.001 seconds, 215 bytes]
#> ▶ dispatched target tail_mtcars
#> ● completed target tail_mtcars [0.001 seconds, 221 bytes]
#> ▶ dispatched target combined_target
#> ● completed target combined_target [0 seconds, 276 bytes]
#> ▶ ended pipeline [0.076 seconds]
tar_read(combined_target)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 3.90 2.62 16.46 0 1 4 4
#> Volvo 142E 21.4 4 121 109 4.11 2.78 18.60 1 1 4 2
To use tar_combine()
and tar_map()
together in more complicated situations, you may need to supply unlist = FALSE
to tar_map()
. That way, tar_map()
will return a nested list of target objects, and you can combine the ones you want. The pipeline extends our previous tar_map()
example by combining just the summaries, omitting the analyses from tar_combine()
. Also note the use of bind_rows(!!!.x)
below. This is how you supply custom code to combine the return values of other targets. .x
is a placeholder for the return values, and !!!
is the “unquote-splice” operator from the rlang
package.
# _targets.R file:
library(targets)
library(tarchetypes)
library(tibble)
<- tar_target(random_seed, seq_len(10))
random_seed <- tar_map(
mapped unlist = FALSE, # Return a nested list from tar_map()
values = tibble(method_function = rlang::syms(c("method1", "method2"))),
tar_target(
analysis,method_function("NIH", seed = random_seed),
pattern = map(random_seed)
),tar_target(
summary,summarize_analysis(analysis),
pattern = map(analysis)
)
)<- tar_combine(
combined
combined_summaries,"summary"]],
mapped[[command = dplyr::bind_rows(!!!.x, .id = "method")
)list(random_seed, mapped, combined)
tar_manifest()
#> Warning message:
#> Targets and globals must have unique names. Ignoring global objects that conflict with target names: random_seed. Warnings like this one are important, but if you must suppress them, you can do so with Sys.setenv(TAR_WARN = "false").
#> # A tibble: 6 × 4
#> name command pattern description
#> <chr> <chr> <chr> <chr>
#> 1 random_seed "seq_len(10)" <NA> <NA>
#> 2 analysis_method1 "method1(\"NIH\", seed = random_seed)" map(ra… method1
#> 3 analysis_method2 "method2(\"NIH\", seed = random_seed)" map(ra… method2
#> 4 summary_method1 "summarize_analysis(analysis_method1)" map(an… method1
#> 5 summary_method2 "summarize_analysis(analysis_method2)" map(an… method2
#> 6 combined_summaries "dplyr::bind_rows(summary_method1 = su… <NA> <NA>
tar_visnetwork(targets_only = TRUE)
#> Warning message:
#> Targets and globals must have unique names. Ignoring global objects that conflict with target names: random_seed. Warnings like this one are important, but if you must suppress them, you can do so with Sys.setenv(TAR_WARN = "false").
16.6 Metaprogramming
Custom metaprogramming is a more flexible alternative to tar_map()
and tar_combine()
. tar_eval()
from tarchetypes
accepts an arbitrary expression and iteratively plugs in symbols. Below, we use it to branch over datasets.
# _targets.R
library(rlang)
library(targets)
library(tarchetypes)
<- c("gapminder", "who", "imf")
string <- syms(string)
symbol tar_eval(
tar_target(symbol, get_data(string)),
values = list(string = string, symbol = symbol)
)
tar_eval()
has fewer guardrails than tar_map()
or tar_combine()
, so tar_manifest()
is especially important for checking the correctness of your metaprogramming.
tar_manifest(fields = command)
#> # A tibble: 3 × 2
#> name command
#> <chr> <chr>
#> 1 imf "get_data(\"imf\")"
#> 2 gapminder "get_data(\"gapminder\")"
#> 3 who "get_data(\"who\")"
16.7 Hooks
Hooks are supported in tarchetypes
version 0.2.0 and above, and they allow you to prepend or wrap code in multiple targets at a time. For example, tar_hook_before()
is a robust way to invoke the conflicted
package to resolve namespace conflicts that works with distributed computing and does not require a project-level .Rprofile
file.
# _targets.R file
library(tarchetypes)
library(magrittr)
tar_option_set(packages = c("conflicted", "dplyr"))
source("R/functions.R")
list(
tar_target(data, get_time_series_data()),
tar_target(analysis1, analyze_months(data)),
tar_target(analysis2, analyze_weeks(data))
%>%
) tar_hook_before(
hook = conflicted_prefer("filter", "dplyr"),
names = starts_with("analysis")
)
# R console
::tar_manifest(fields = command)
targets#> # A tibble: 3 × 2
#> name command
#> <chr> <chr>
#> 1 data "get_time_series_data()"
#> 2 analysis1 "{\n conflicted_prefer(\"filter\", \"dplyr\")\n analyze(dat…
#> 3 analysis2 "{\n conflicted_prefer(\"filter\", \"dplyr\")\n analyze(dat…
Similarly, tar_hook_outer()
wraps expressions around target commands, and tar_hook_inner()
wraps expressions around target dependencies. These hooks could potentially help encrypt targets before storage in _targets/
and decrypt targets before retrieval, as demonstrated in the sketch below.
Data security is the sole responsibility of the user and not the responsibility of targets
, tarchetypes
, or related pipeline packages. You as the user are responsible for validating your own target specifications and custom code and applying additional security precautions as appropriate for the situation.
# _targets.R file
library(tarchetypes)
library(magrittr)
list(
tar_target(data1, get_data1()),
tar_target(data2, get_data2()),
tar_target(analysis, analyze(data1, data2))
%>%
) tar_hook_outer(encrypt(.x, threads = 2)) %>%
tar_hook_inner(decrypt(.x))
# R console
::tar_manifest(fields = command)
targets#> # A tibble: 3 × 2
#> name command
#> <chr> <chr>
#> 1 data1 encrypt(get_data1(), threads = 2)
#> 2 data2 encrypt(get_data2(), threads = 2)
#> 3 analysis encrypt(analyze(decrypt(data1), decrypt(data2)), threads = 2)