Chapter 17 Visualization with drake
Data analysis projects have complicated networks of dependencies, and drake
can help you visualize them with vis_drake_graph()
, sankey_drake_graph()
, and drake_ggraph()
(note the two g’s).
17.1 Plotting plans
Except for drake
7.7.0 and below, you can simply plot()
the plan to show the targets and their dependency relationships.
library(drake)
# from https://github.com/wlandau/drake-examples/tree/main/mtcars
load_mtcars_example()
my_plan#> # A tibble: 15 x 2
#> target command
#> <chr> <expr_lst>
#> 1 report knitr::knit(drake::knitr_in("report.Rmd"), drake::file_ou…
#> 2 small simulate(48) …
#> 3 large simulate(64) …
#> 4 regression1_small reg1(small) …
#> 5 regression1_large reg1(large) …
#> 6 regression2_small reg2(small) …
#> 7 regression2_large reg2(large) …
#> 8 summ_regression1_… suppressWarnings(summary(regression1_small$residuals)) …
#> 9 summ_regression1_… suppressWarnings(summary(regression1_large$residuals)) …
#> 10 summ_regression2_… suppressWarnings(summary(regression2_small$residuals)) …
#> 11 summ_regression2_… suppressWarnings(summary(regression2_large$residuals)) …
#> 12 coef_regression1_… suppressWarnings(summary(regression1_small))$coefficients…
#> 13 coef_regression1_… suppressWarnings(summary(regression1_large))$coefficients…
#> 14 coef_regression2_… suppressWarnings(summary(regression2_small))$coefficients…
#> 15 coef_regression2_… suppressWarnings(summary(regression2_large))$coefficients…
plot(my_plan)
17.1.1 vis_drake_graph()
Powered by visNetwork
. Colors represent target status, and shapes represent data type. These graphs are interactive, so you can click, drag, zoom, and and pan to adjust the size and position. Double-click on nodes to contract neighborhoods into clusters or expand them back out again. If you hover over a node, you will see text in a tooltip showing the first few lines of
- The command of a target, or
- The body of an imported function, or
- The content of an imported text file.
vis_drake_graph(my_plan)
To save this interactive widget for later, just supply the name of an HTML file.
vis_drake_graph(my_plan, file = "graph.html")
To save a static image file, supply a file name that ends in ".png"
, ".pdf"
, ".jpeg"
, or ".jpg"
.
vis_drake_graph(my_plan, file = "graph.png")
17.1.2 sankey_drake_graph()
These interactive networkD3
Sankey diagrams have more nuance: the height of each node is proportional to its number of connections. Nodes with many incoming connnections tend to fall out of date more often, and nodes with many outgoing connections can invalidate bigger chunks of the downstream pipeline.
sankey_drake_graph(my_plan)
Saving the graphs is the same as before.
sankey_drake_graph(my_plan, file = "graph.html") # Interactive HTML widget
sankey_drake_graph(my_plan, file = "graph.png") # Static image file
Unfortunately, a legend is not yet available for Sankey diagrams, but drake
exposes a separate legend for the colors and shapes.
library(visNetwork)
legend_nodes()
#> # A tibble: 12 x 6
#> label color shape font.color font.size id
#> <chr> <chr> <chr> <chr> <dbl> <int>
#> 1 Up to date #228B22 dot black 20 1
#> 2 Outdated #000000 dot black 20 2
#> 3 Running #FF7221 dot black 20 3
#> 4 Cancelled #ECB753 dot black 20 4
#> 5 Failed #AA0000 dot black 20 5
#> 6 Imported #1874CD dot black 20 6
#> 7 Missing #9A32CD dot black 20 7
#> 8 Object #888888 dot black 20 8
#> 9 Dynamic #888888 star black 20 9
#> 10 Function #888888 triangle black 20 10
#> 11 File #888888 square black 20 11
#> 12 Cluster #888888 diamond black 20 12
visNetwork(nodes = legend_nodes())
17.1.3 drake_ggraph()
drake_ggraph()
can handle larger workflows than the other graphing functions. If your project has thousands of targets and vis_drake_graph()
/sankey_drake_graph()
does not render properly, consider drake_ggraph()
. Powered by ggraph
, drake_ggraph()
s are static ggplot2
objects, and you can save them with ggsave()
.
drake_ggraph(my_plan)
17.1.4 text_drake_graph()
If you are running R in a terminal without X Window support, the usual visualizations will show up interactively in your session. Here, you can use text_drake_graph()
to see a text display in your terminal window. Terminal colors are deactivated in this manual, but you will see color in your console.
# Use nchar = 0 or nchar = 1 for better results.
# The color display is better in your own terminal.
text_drake_graph(my_plan, nchar = 3)
#> reg sum
#> reg coe
#> lar
#> dat sim sum
#> reg
#> ran coe
#> sma
#> fil sum
#> reg
#> rep coe
#> kni fil
#> reg sum
#> reg coe
17.2 Underlying graph data: node and edge data frames
drake_graph_info()
is used behind the scenes in vis_drake_graph()
, sankey_drake_graph()
, and drake_ggraph()
to get the graph information ready for rendering. To save time, you can call drake_graph_info()
to get these internals and then call render_drake_graph()
, render_sankey_drake_graph()
, or render_drake_ggraph()
.
str(drake_graph_info(my_plan))
#> List of 4
#> $ nodes : tibble [23 × 12] (S3: tbl_df/tbl/data.frame)
#> ..$ id : chr [1:23] "reg2" "n-NNXGS5DSHI5GW3TJOQ" "p-OJSXA33SOQXFE3LE" "random_rows" ...
#> ..$ imported : logi [1:23] TRUE TRUE TRUE TRUE TRUE TRUE ...
#> ..$ label : chr [1:23] "reg2" "knitr::knit" "file report.Rmd" "random_rows" ...
#> ..$ status : chr [1:23] "imported" "imported" "imported" "imported" ...
#> ..$ type : chr [1:23] "function" "function" "file" "function" ...
#> ..$ font.size: num [1:23] 20 20 20 20 20 20 20 20 20 20 ...
#> ..$ color : chr [1:23] "#1874CD" "#1874CD" "#1874CD" "#1874CD" ...
#> ..$ shape : chr [1:23] "triangle" "triangle" "square" "triangle" ...
#> ..$ level : num [1:23] 1 1 1 1 1 1 2 2 3 3 ...
#> ..$ title : chr [1:23] "Call drake_graph_info(hover = TRUE) for informative text." "Call drake_graph_info(hover = TRUE) for informative text." "Call drake_graph_info(hover = TRUE) for informative text." "Call drake_graph_info(hover = TRUE) for informative text." ...
#> ..$ x : num [1:23] -1 -1 -1 -1 -1 -1 -0.5 -0.5 0 0 ...
#> ..$ y : num [1:23] -0.918 -0.551 -0.184 0.184 0.551 ...
#> $ edges : tibble [23 × 3] (S3: tbl_df/tbl/data.frame)
#> ..$ from : chr [1:23] "small" "small" "reg2" "reg2" ...
#> ..$ to : chr [1:23] "regression1_small" "regression2_small" "regression2_large" "regression2_small" ...
#> ..$ arrows: chr [1:23] "to" "to" "to" "to" ...
#> $ legend_nodes : tibble [5 × 6] (S3: tbl_df/tbl/data.frame)
#> ..$ label : chr [1:5] "Outdated" "Imported" "Object" "Function" ...
#> ..$ color : chr [1:5] "#000000" "#1874CD" "#888888" "#888888" ...
#> ..$ shape : chr [1:5] "dot" "dot" "dot" "triangle" ...
#> ..$ font.color: chr [1:5] "black" "black" "black" "black" ...
#> ..$ font.size : num [1:5] 20 20 20 20 20
#> ..$ id : int [1:5] 2 6 8 10 11
#> $ default_title: chr "Dependency graph"
#> - attr(*, "class")= chr "drake_graph_info"
17.3 Visualizing target status
drake
’s visuals tell you which targets are up to date and which are outdated.
make(my_plan, verbose = 0L)
outdated(my_plan)
#> character(0)
sankey_drake_graph(my_plan)
When you change a dependency, some targets fall out of date (black nodes).
<- function(d){
reg2 $x3 <- d$x ^ 3
dlm(y ~ x3, data = d)
}sankey_drake_graph(my_plan)
17.4 Subgraphs
Graphs can grow enormous for serious projects, so there are multiple ways to focus on a manageable subgraph. The most brute-force way is to just pick a manual subset
of nodes. However, with the subset
argument, the graphing functions can drop intermediate nodes and edges.
vis_drake_graph(
my_plan,subset = c("regression2_small", "large")
)
The rest of the subgraph functionality preserves connectedness. Use targets_only
to ignore the imports.
vis_drake_graph(my_plan, targets_only = TRUE)
Similarly, you can just show downstream nodes.
vis_drake_graph(my_plan, from = c("regression2_small", "regression2_large"))
Or upstream ones.
vis_drake_graph(my_plan, from = "small", mode = "in")
In fact, let us just take a small neighborhood around a target in both directions. For the graph below, given order is 1, but all the custom file_out()
output files of the neighborhood’s targets appear as well. This ensures consistent behavior between show_output_files = TRUE
and show_output_files = FALSE
(more on that later).
vis_drake_graph(my_plan, from = "small", mode = "all", order = 1)
17.5 Control the vis_drake_graph()
legend.
Some arguments to vis_drake_graph()
control the legend.
vis_drake_graph(my_plan, full_legend = TRUE, ncol_legend = 2)
To remove the legend altogether, set the ncol_legend
argument to 0
.
vis_drake_graph(my_plan, ncol_legend = 0)
17.6 Clusters
With the group
and clusters
arguments to the graphing functions, you can condense nodes into clusters. This is handy for workflows with lots of targets. Take the schools scenario from the drake
plan guide. Our plan was generated with drake_plan(trace = TRUE)
, so it has wildcard columns that group nodes into natural clusters already. You can manually add such columns if you wish.
# Visit https://books.ropensci.org/drake/static.html
# to learn about the syntax with target(transform = ...).
<- drake_plan(
plan school = target(
get_school_data(id),
transform = map(id = c(1, 2, 3))
),credits = target(
fun(school),
transform = cross(
school,fun = c(check_credit_hours, check_students, check_graduations)
)
),public_funds_school = target(
command = check_public_funding(school),
transform = map(school = c(school_1, school_2))
),trace = TRUE
)
plan#> # A tibble: 14 x 7
#> target command fun school credits public_funds_scho… id
#> <chr> <expr_lst> <chr> <chr> <chr> <chr> <chr>
#> 1 credits_che… check_credi… check_… schoo… credits_ch… <NA> <NA>
#> 2 credits_che… check_stude… check_… schoo… credits_ch… <NA> <NA>
#> 3 credits_che… check_gradu… check_… schoo… credits_ch… <NA> <NA>
#> 4 credits_che… check_credi… check_… schoo… credits_ch… <NA> <NA>
#> 5 credits_che… check_stude… check_… schoo… credits_ch… <NA> <NA>
#> 6 credits_che… check_gradu… check_… schoo… credits_ch… <NA> <NA>
#> 7 credits_che… check_credi… check_… schoo… credits_ch… <NA> <NA>
#> 8 credits_che… check_stude… check_… schoo… credits_ch… <NA> <NA>
#> 9 credits_che… check_gradu… check_… schoo… credits_ch… <NA> <NA>
#> 10 public_fund… check_publi… <NA> schoo… <NA> public_funds_scho… <NA>
#> 11 public_fund… check_publi… <NA> schoo… <NA> public_funds_scho… <NA>
#> 12 school_1 get_school_… <NA> schoo… <NA> <NA> 1
#> 13 school_2 get_school_… <NA> schoo… <NA> <NA> 2
#> 14 school_3 get_school_… <NA> schoo… <NA> <NA> 3
Ordinarily, the workflow graph gives a separate node to each individual import object or target.
vis_drake_graph(plan)
For large projects with hundreds of nodes, this can get quite cumbersome. But here, we can choose a wildcard column (or any other column in the plan, even custom columns) to condense nodes into natural clusters. For the group
argument to the graphing functions, choose the name of a column in plan
or a column you know will be in drake_graph_info(my_plan)$nodes
. Then for clusters
, choose the values in your group
column that correspond to nodes you want to bunch together. The new graph is not as cumbersome.
vis_drake_graph(plan,
group = "school",
clusters = c("school_1", "school_2", "school_3")
)
As previously mentioned, you can group on any column in drake_graph_info(my_plan)$nodes
. Let’s return to the mtcars
project for demonstration.
vis_drake_graph(my_plan)
Let’s condense all the imports into one node and all the up-to-date targets into another. That way, the outdated targets stand out.
vis_drake_graph(
my_plan,group = "status",
clusters = c("imported", "up to date")
)
17.7 Output files
drake
can reproducibly track multiple output files per target and show them in the graph.
<- drake_plan(
plan target1 = {
file.copy(file_in("in1.txt"), file_out("out1.txt"))
file.copy(file_in("in2.txt"), file_out("out2.txt"))
},target2 = {
file.copy(file_in("out1.txt"), file_out("out3.txt"))
file.copy(file_in("out2.txt"), file_out("out4.txt"))
}
)writeLines("in1", "in1.txt")
writeLines("in2", "in2.txt")
make(plan)
#> ▶ target target1
#> ▶ target target2
writeLines("abcdefg", "out3.txt")
vis_drake_graph(plan, targets_only = TRUE)
If your graph is too busy, you can hide the output files with show_output_files = FALSE
.
vis_drake_graph(plan, show_output_files = FALSE, targets_only = TRUE)
17.8 Node Selection
(Supported in drake > 7.7.0 only)
First, we define our plan, adding a custom column named “link”.
<-
mtcars_link "https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/mtcars.html"
<- drake_plan(
plan mtc = target(
mtcars,link = !!mtcars_link
),mtc2 = target(
mtc,link = !!mtcars_link
),mtc3 = target(
modify_mtc2(mtc2, number),
transform = map(number = !!c(1:3), .tag_in = cluster_id),
link = !!mtcars_link
),trace = TRUE
)
<- unique(plan$cluster_id) unique_stems
17.8.1 Perform the default action on select
By supplying vis_drake_graph(on_select = TRUE, on_select_col = "my_column")
,
treats the values in the column named "my_column"
as hyperlinks. Click on a node in the graph to navigate to the corresponding link in your browser.
vis_drake_graph(
plan,clusters = unique_stems,
group = "cluster_id",
on_select_col = "link",
on_select = TRUE
)
17.8.2 Perform no action on select
No action will be taken if any of the following are given to
vis_drake_graph()
:
on_select = NULL
,on_select = FALSE
,on_select_col = NULL
This is the default behaviour.
vis_drake_graph(
my_plan,clusters = unique_stems,
group = "cluster_id",
on_select_col = "link",
on_select = NULL
)
17.8.3 Customize the onSelect event behaviour
What if we instead wanted the browser to display an alert when a node is clicked?
<- function(){
alert_behaviour <- "
js function(props) {
alert('selected node with on_select_col: \\r\\n' +
this.body.data.nodes.get(props.nodes[0]).on_select_col);
}"
}
vis_drake_graph(
my_plan,on_select_col = "link",
on_select = alert_behaviour()
)
17.9 Enhanced interactivity
For enhanced interactivity, including custom interactive target documentation, see the mandrake
R package. For a taste of the functionality, visit this vignette page and click the mtcars
node in the graph.