Chapter 7 Progress and state

In taxize v0.9.8 a new set of features was introduced to keep track of progress of going through each name passed to get_*() functions, better messages about progress and better summary info, and the ability to restart from where you left off if a get_*() function gets interrupted.

7.1 Progress

Within each get_*() function we keep track of progress. We do this using the progressor R6 class, an internal taxize class. progessor is not meant to be used by the taxize user. Within each get_*() function we initiate a new progress object:

prog <- progressor$new(...)

As the get_*() function progresses the prog object is updated with data and metadata as each name is gone through in the for loop. In addition, prog handles nice print statements and summary info, which can also be suppressed as needed.

progressor is not exported but when you have taxize loaded you can go to its manual file (?progressor) to dig into how it works.

taxize_options() is a new function in taxize, introduced in v.0.9.8. It is meant to manage package wide options in general. The first option it has is taxon_state_messages, which is a boolean (default: TRUE) which controls messages printed by progressor - you can set taxon_state_messages=FALSE if you want to suppress messages from progressor.

Messages look like:

get_uid(c("Chironomus riparius", "howdy"))
#> ══  2 queries  ═══════════════
#> 
#> Retrieving data for taxon 'Chironomus riparius'
#> 
#> ✔  Found:  Chironomus+riparius
#> 
#> Retrieving data for taxon 'howdy'
#> 
#> ✖  Not Found:  foo bar
#> ══  Results  ═════════════════
#> 
#> ● Total: 2
#> ● Found: 1
#> ● Not Found: 1

Giving:

  • 2 queries: number of names being searched
  • Retrieving data for ...: which name is being searched
  • ✔ Found: ...: when a name is found, there’s a and the name
  • ✖ Not Found: ...: when a name is not found, there’s a and the name
  • At the end is a summary of number of names searched, number of names found and number of names not found

7.2 State

The first parameter of every get_*() function now accepts either one or more taxonomic names OR a taxon_state object. A taxon_state object is an R6 object that helps keep track of where you are in a get_*() function call. The TLDR is that it means if you’re get_*() function gets interrupted because of any number of things (e.g., internet goes out, the dreaded curl timeout, you hit the wrong key, the remote data provider temporarily goes down) you can call taxon_last() and pass that in to your get_*() function to start from where you left off before the interruption.

spp <- names_list("species", 3)
res <- get_gbifid(spp)
#> ══  3 queries  ═══════════════
#> ✔  Found:  Rhus meyeriana
#> ✔  Found:  Saurauia wigmanii
#> ✔  Found:  Ilex pauciflora
#> ══  Results  ═════════════════
#> 
#> ● Total: 3 
#> ● Found: 3 
#> ● Not Found: 0
z <- taxon_last()
z
#> <taxon state> 
#>  class: gbifid
#>  elapsed (sec): 3.54
#>  count: 3
#>   Rhus meyeriana: 3659641
#>   Saurauia wigmanii: 7765753
#>   Ilex pauciflora: 5533564
z$taxa_remaining()
#> character(0)
z$taxa_completed()
#> [1] "Ilex pauciflora"   "Rhus meyeriana"    "Saurauia wigmanii"
z$count # active binding; no parens needed
#> [1] 3

See ?taxon_state with taxize loaded to get all the details on working with taxon state objects.