Chapter 5 Identifiers

Taxonomic identifiers are a central concept in taxize - and around which many things depend on. We often either need to get to an identifier for each name in question, or take an identifier to do something else, for example get a taxonomic classification.

(Note: See Progress and state for details on keeping track of progress maintaining state in get_ functions, and functions that use get_ functions under the hood.)

5.1 get functions

There’s a family of functions, each targeted at a specific data source, for getting taxonomic identifiers from names. They all start with the prefix get_, and end with an abbreviation for the data source. They are:

  • get_wiki
  • get_gbifid
  • get_iucn
  • get_natservid
  • get_colid
  • get_boldid
  • get_nbnid
  • get_tsn
  • get_wormsid
  • get_ids
  • get_pow
  • get_uid
  • get_ubioid
  • get_tolid
  • get_tpsid
  • get_eolid

Another set of related functions with a trailing underscore (e.g., get_uid_()) are available for all data sources, but they do not do interactive usage, but instead provide all data. That is, if you use e.g., get_tsn() and there is more than one option returned from ITIS, you will be given a prompt which will require your input. If you want to avoid potential prompts, use get_tsn_() instead, and then manipulate/filter data yourself.

5.2 get functions return objects

The returned data from get_* functions is either an NA_character_ if no match, or a character string or numeric (some data providers use numeric taxonomic identifiers, while others use alphanumeric identifiers) if a match. The length can be greater than one as all the functions are vectorized.

There are a number of attributes attached to the output, where each attributes length equals the length of the id vector:

  • class: the class of the object
  • match: whether a match was found or not
  • multiple_matches: whether there were multiple matches or not
  • pattern_match: whether there was a match by pattern (i.e., regex)
  • uri: the URI/URL for the taxon

An example return from a get_* function:

[1] "50157568"
attr(,"class")
[1] "tpsid"
attr(,"match")
[1] "found"
attr(,"multiple_matches")
[1] TRUE
attr(,"pattern_match")
[1] FALSE
attr(,"uri")
[1] "http://tropicos.org/Name/50157568"

5.3 identifier notes

  • Taxonomic identifiers are unique to the data source. That is, the taxonomic ID for any individual taxon in data source A is different from that in data source B, and different from that in data source C, etc. For example, the taxonomic ID for Pinus contorta in NCBI is 3339, the ID in ITIS is 183327, and the ID in COL is 13c40c3f2be0b0965bddf948d2b3115f.
  • Taxonomic identifiers vary among data sources in their structure. That is, some are numeric (only numbers), some are alphanumeric (combination of numbers and letters), and one has an odd structure with letters, numbers and symbols (Nature Serve, e.g., ELEMENT_GLOBAL.2.147775). Most data sources provide numeric identifiers. Wikipedia is unusual in that there are no numeric/alphanumeric/etc. taxonomic identifiers for each name - instead, the “identifier” IS the name.

5.4 Looking up taxa on the web

If you want to look up the taxon page for any taxa that you’ve passed through get_* functions you can do so by using the uri attributes returned from the function.

For example:

x <- get_tsn('Pinus contorta')
browseURL(attr(x, 'uri'))