Chapter 9 Static branching

9.1 Branching

Sometimes, a pipeline contains more targets than a user can comfortably type by hand. For projects with hundreds of targets, branching can make the _targets.R file more concise and easier to read and maintain.

targets supports two types of branching: dynamic branching and static branching. Some projects are better suited to dynamic branching, while others benefit more from static branching or a combination of both. Some users understand dynamic branching more easily because it avoids metaprogramming, while others prefer static branching because tar_manifest() and tar_visnetwork() provide immediate feedback. Except for the section on dynamic-within-static branching, you can read the two chapters on branching in any order (or skip them) depending on your needs.

9.2 When to use static branching

Static branching is the act of defining a group of targets in bulk before the pipeline starts. Whereas dynamic branching uses last-minute dependency data to define the branches, static branching uses metaprogramming to modify the code of the pipeline up front. Whereas dynamic branching excels at creating a large number of very similar targets, static branching is most useful for smaller number of heterogeneous targets. Some users find it more convenient because they can use tar_manifest() and tar_visnetwork() to check the correctness of static branching before launching the pipeline.

9.3 Map

tar_map() from the tarchetypes package creates copies of existing target objects, where each new command is a variation on the original. In the example below, we have a data analysis workflow that iterates over datasets and analysis methods. The values data frame has the operational parameters of each data analysis, and tar_map() creates one new target per row.

# _targets.R file:
library(targets)
library(tarchetypes)
library(tibble)
values <- tibble(
  method_function = rlang::syms(c("method1", "method2")),
  data_source = c("NIH", "NIAID")
)
targets <- tar_map(
  values = values,
  tar_target(analysis, method_function(data_source, reps = 10)),
  tar_target(summary, summarize_analysis(analysis, data_source))
)
list(targets)
tar_manifest()
#> # A tibble: 4 × 3
#>   name                 command                                           pattern
#>   <chr>                <chr>                                             <chr>  
#> 1 analysis_method2_NI… "method2(\"NIAID\", reps = 10)"                   <NA>   
#> 2 analysis_method1_NIH "method1(\"NIH\", reps = 10)"                     <NA>   
#> 3 summary_method2_NIA… "summarize_analysis(analysis_method2_NIAID, \"NI… <NA>   
#> 4 summary_method1_NIH  "summarize_analysis(analysis_method1_NIH, \"NIH\… <NA>
tar_visnetwork(targets_only = TRUE)

For shorter target names, use the names argument of tar_map(). And for more combinations of settings, use tidyr::expand_grid() on values.

# _targets.R file:
library(targets)
library(tarchetypes)
library(tidyr)
values <- expand_grid( # Use all possible combinations of input settings.
  method_function = rlang::syms(c("method1", "method2")),
  data_source = c("NIH", "NIAID")
)
targets <- tar_map(
  values = values,
  names = "data_source", # Select columns from `values` for target names.
  tar_target(analysis, method_function(data_source, reps = 10)),
  tar_target(summary, summarize_analysis(analysis, data_source))
)
list(targets)
tar_manifest()
#> # A tibble: 8 × 3
#>   name             command                                           pattern
#>   <chr>            <chr>                                             <chr>  
#> 1 analysis_NIAID_1 "method2(\"NIAID\", reps = 10)"                   <NA>   
#> 2 analysis_NIAID   "method1(\"NIAID\", reps = 10)"                   <NA>   
#> 3 analysis_NIH_1   "method2(\"NIH\", reps = 10)"                     <NA>   
#> 4 analysis_NIH     "method1(\"NIH\", reps = 10)"                     <NA>   
#> 5 summary_NIAID_1  "summarize_analysis(analysis_NIAID_1, \"NIAID\")" <NA>   
#> 6 summary_NIAID    "summarize_analysis(analysis_NIAID, \"NIAID\")"   <NA>   
#> 7 summary_NIH_1    "summarize_analysis(analysis_NIH_1, \"NIH\")"     <NA>   
#> 8 summary_NIH      "summarize_analysis(analysis_NIH, \"NIH\")"       <NA>
# You may need to zoom out on this interactive graph to see all 8 targets.
tar_visnetwork(targets_only = TRUE)

9.4 Dynamic-within-static branching

You can even combine together static and dynamic branching. The static tar_map() is an excellent outer layer on top of targets with patterns. The following is a sketch of a pipeline that runs each of two data analysis methods 10 times, once per random seed. Static branching iterates over the method functions, while dynamic branching iterates over the seeds. tar_map() creates new patterns as well as new commands. So below, the summary methods map over the analysis methods both statically and dynamically.

# _targets.R file:
library(targets)
library(tarchetypes)
library(tibble)
random_seed_target <- tar_target(random_seed, seq_len(10))
targets <- tar_map(
  values = tibble(method_function = rlang::syms(c("method1", "method2"))),
  tar_target(
    analysis,
    method_function("NIH", seed = random_seed),
    pattern = map(random_seed)
  ),
  tar_target(
    summary,
    summarize_analysis(analysis),
    pattern = map(analysis)
  )
)
list(random_seed_target, targets)
tar_manifest()
#> # A tibble: 5 × 3
#>   name             command                                pattern              
#>   <chr>            <chr>                                  <chr>                
#> 1 random_seed      "seq_len(10)"                          <NA>                 
#> 2 analysis_method1 "method1(\"NIH\", seed = random_seed)" map(random_seed)     
#> 3 analysis_method2 "method2(\"NIH\", seed = random_seed)" map(random_seed)     
#> 4 summary_method1  "summarize_analysis(analysis_method1)" map(analysis_method1)
#> 5 summary_method2  "summarize_analysis(analysis_method2)" map(analysis_method2)
tar_visnetwork(targets_only = TRUE)

9.5 Combine

tar_combine() from the tarchetypes package creates a new target to aggregate the results of upstream targets. In the simple example below, our combined target simply aggregates the rows returned from two other targets.

# _targets.R file:
library(targets)
library(tarchetypes)
library(tibble)
options(crayon.enabled = FALSE)
target1 <- tar_target(head, head(mtcars, 1))
target2 <- tar_target(tail, tail(mtcars, 1))
target3 <- tar_combine(combined_target, target1, target2)
list(target1, target2, target3)
tar_manifest()
#> # A tibble: 3 × 3
#>   name           command                                                 pattern
#>   <chr>          <chr>                                                   <chr>  
#> 1 head_mtcars    head(mtcars, 1)                                         <NA>   
#> 2 tail_mtcars    tail(mtcars, 1)                                         <NA>   
#> 3 combined_targ… vctrs::vec_c(head_mtcars = head_mtcars, tail_mtcars = … <NA>
tar_visnetwork(targets_only = TRUE)
tar_make()
#> • start target head_mtcars
#> • built target head_mtcars
#> • start target tail_mtcars
#> • built target tail_mtcars
#> • start target combined_target
#> • built target combined_target
#> • end pipeline
tar_read(combined_target)
#>             mpg cyl disp  hp drat   wt  qsec vs am gear carb
#> Mazda RX4  21.0   6  160 110 3.90 2.62 16.46  0  1    4    4
#> Volvo 142E 21.4   4  121 109 4.11 2.78 18.60  1  1    4    2

To use tar_combine() and tar_map() together in more complicated situations, you may need to supply unlist = FALSE to tar_map(). That way, tar_map() will return a nested list of target objects, and you can combine the ones you want. The pipeline extends our previous tar_map() example by combining just the summaries, omitting the analyses from tar_combine(). Also note the use of bind_rows(!!!.x) below. This is how you supply custom code to combine the return values of other targets. .x is a placeholder for the return values, and !!! is the “unquote-splice” operator from the rlang package.

# _targets.R file:
library(targets)
library(tarchetypes)
library(tibble)
random_seed <- tar_target(random_seed, seq_len(10))
mapped <- tar_map(
  unlist = FALSE, # Return a nested list from tar_map()
  values = tibble(method_function = rlang::syms(c("method1", "method2"))),
  tar_target(
    analysis,
    method_function("NIH", seed = random_seed),
    pattern = map(random_seed)
  ),
  tar_target(
    summary,
    summarize_analysis(analysis),
    pattern = map(analysis)
  )
)
combined <- tar_combine(
  combined_summaries,
  mapped[[2]],
  command = dplyr::bind_rows(!!!.x, .id = "method")
)
list(random_seed, mapped, combined)
tar_manifest()
#> Warning message:
#> Targets and globals must have unique names. Ignoring global objects that conflict with target names: random_seed. Suppress this warning with Sys.setenv(TAR_WARN = "false") in _targets.R.
#> # A tibble: 6 × 3
#>   name          command                                          pattern        
#>   <chr>         <chr>                                            <chr>          
#> 1 random_seed   "seq_len(10)"                                    <NA>           
#> 2 analysis_met… "method1(\"NIH\", seed = random_seed)"           map(random_see…
#> 3 analysis_met… "method2(\"NIH\", seed = random_seed)"           map(random_see…
#> 4 summary_meth… "summarize_analysis(analysis_method1)"           map(analysis_m…
#> 5 summary_meth… "summarize_analysis(analysis_method2)"           map(analysis_m…
#> 6 combined_sum… "dplyr::bind_rows(summary_method1 = summary_met… <NA>
tar_visnetwork(targets_only = TRUE)
#> Warning message:
#> Targets and globals must have unique names. Ignoring global objects that conflict with target names: random_seed. Suppress this warning with Sys.setenv(TAR_WARN = "false") in _targets.R.

9.6 Metaprogramming

Custom metaprogramming is a more flexible alternative to tar_map() and tar_combine(). tar_eval() from tarchetypes accepts an arbitrary expression and iteratively plugs in symbols. Below, we use it to branch over datasets.

# _targets.R
library(rlang)
library(targets)
library(tarchetypes)
string <- c("gapminder", "who", "imf")
symbol <- syms(string)
tar_eval(
  tar_target(symbol, get_data(string)),
  values = list(string = string, symbol = symbol)
)

tar_eval() has fewer guardrails than tar_map() or tar_combine(), so tar_manifest() is especially important for checking the correctness of your metaprogramming.

tar_manifest(fields = command)
#> # A tibble: 3 × 2
#>   name      command                  
#>   <chr>     <chr>                    
#> 1 imf       "get_data(\"imf\")"      
#> 2 gapminder "get_data(\"gapminder\")"
#> 3 who       "get_data(\"who\")"

9.7 Hooks

Hooks are supported in tarchtypes version 0.2.0 and above, and they allow you to prepend or wrap code in multiple targets at a time. For example, tar_hook_before() is a robust way to invoke the conflicted package to resolve namespace conflicts that works with distributed computing and does not require a project-level .Rprofile file.

# _targets.R file
library(tarchetypes)
library(magrittr)
tar_option_set(packages = c("conflicted", "dplyr"))
source("R/functions.R")
list(
  tar_target(data, get_time_series_data()),
  tar_target(analysis1, analyze_months(data)),
  tar_target(analysis2, analyze_weeks(data))
) %>%
  tar_hook_before(
    hook = conflicted_prefer("filter", "dplyr"),
    names = starts_with("analysis")
  )
# R console
targets::tar_manifest(fields = command)
#> # A tibble: 3 × 2
#>   name      command                                                             
#>   <chr>     <chr>                                                               
#> 1 data      "get_time_series_data()"                                            
#> 2 analysis1 "{ \\n     conflicted_prefer(\"filter\", \"dplyr\") \\n     analyze…
#> 3 analysis2 "{ \\n     conflicted_prefer(\"filter\", \"dplyr\") \\n     analyze…

Similarly, tar_hook_outer() wraps expressions around target commands, and tar_hook_inner() wraps expressions around target dependencies. These hooks could potentially help encrypt targets before storage in _targets/ and decrypt targets before retrieval, as demonstrated in the sketch below.

Data security is the sole responsibility of the user and not the responsibility of targets, tarchetypes, or related pipeline packages. You as the user are responsible for validating your own target specifications and custom code and applying additional security precautions as appropriate for the situation.

# _targets.R file
library(tarchetypes)
library(magrittr)
list(
  tar_target(data1, get_data1()),
  tar_target(data2, get_data2()),
  tar_target(analysis, analyze(data1, data2))
) %>%
  tar_hook_outer(encrypt(.x, threads = 2)) %>%
  tar_hook_inner(decrypt(.x))
# R console
targets::tar_manifest(fields = command)
#> # A tibble: 3 × 2
#>   name     command                                                      
#>   <chr>    <chr>                                                        
#> 1 data1    encrypt(get_data1(), threads = 2)                            
#> 2 data2    encrypt(get_data2(), threads = 2)                            
#> 3 analysis encrypt(analyze(decrypt(data1), decrypt(data2)), threads = 2)
Copyright Eli Lilly and Company