Chapter 5 Identifiers
Taxonomic identifiers are a central concept in taxize
- and around which
many things depend on. We often either need to get to an identifier for each
name in question, or take an identifier to do something else, for example
get a taxonomic classification.
(Note: See Progress and state for details on keeping track
of progress maintaining state in get_
functions, and functions that use
get_
functions under the hood.)
5.1 get functions
There’s a family of functions, each targeted at a specific data source, for
getting taxonomic identifiers from names. They all start with the prefix
get_
, and end with an abbreviation for the data source. They are:
get_wiki
get_gbifid
get_iucn
get_natservid
get_colid
get_boldid
get_nbnid
get_tsn
get_wormsid
get_ids
get_pow
get_uid
get_ubioid
get_tolid
get_tpsid
get_eolid
Another set of related functions with a trailing underscore (e.g., get_uid_()
) are
available for all data sources, but they do not do interactive usage, but instead
provide all data. That is, if you use e.g., get_tsn()
and there is more than
one option returned from ITIS, you will be given a prompt which will require
your input. If you want to avoid potential prompts, use get_tsn_()
instead,
and then manipulate/filter data yourself.
5.2 get functions return objects
The returned data from get_*
functions is either an NA_character_
if no match,
or a character string or numeric (some data providers use numeric taxonomic
identifiers, while others use alphanumeric identifiers) if a match. The length can
be greater than one as all the functions are vectorized.
There are a number of attributes attached to the output, where each attributes length equals the length of the id vector:
- class: the class of the object
- match: whether a match was found or not
- multiple_matches: whether there were multiple matches or not
- pattern_match: whether there was a match by pattern (i.e., regex)
- uri: the URI/URL for the taxon
An example return from a get_*
function:
5.3 identifier notes
- Taxonomic identifiers are unique to the data source. That is, the taxonomic
ID for any individual taxon in data source A is different from that in data
source B, and different from that in data source C, etc. For example, the
taxonomic ID for Pinus contorta in NCBI is
3339
, the ID in ITIS is183327
, and the ID in COL is13c40c3f2be0b0965bddf948d2b3115f
. - Taxonomic identifiers vary among data sources in their structure. That is,
some are numeric (only numbers), some are alphanumeric (combination of numbers
and letters), and one has an odd structure with letters, numbers and symbols
(Nature Serve, e.g.,
ELEMENT_GLOBAL.2.147775
). Most data sources provide numeric identifiers. Wikipedia is unusual in that there are no numeric/alphanumeric/etc. taxonomic identifiers for each name - instead, the “identifier” IS the name. - …