Chapter 3 spocc
3.1 What is spocc?
spocc is an R package that is a single user interface to occurrence data.
spocc
currently interfaces with ten major biodiversity repositories
- Global Biodiversity Information Facility (GBIF) (via
rgbif
) - Berkeley Ecoengine (via
ecoengine
) - iNaturalist
- VertNet (via
rvertnet
) - Biodiversity Information Serving Our Nation (via
rbison
) - eBird (via
rebird
) - AntWeb (via
AntWeb
) - iDigBio (via
ridigbio
) - OBIS
- Atlas of Living Australia
3.2 Basic example
Load spocc
and request data for a given taxonomic name, for each of three different data sources, specify options across all data sources (e.g., limit
number of records), and options for specific data sources (using gbifopts = list()
, etc. for each data source).
library(spocc)
out <- occ(query = 'Setophaga caerulescens', from = c('gbif','bison','inat'), limit = 3, gbifopts = list(country = 'US'))
Access GBIF data
out$gbif
#> Species [Setophaga caerulescens (3)]
#> First 10 rows of [Setophaga_caerulescens]
#>
#> # A tibble: 3 x 73
#> name longitude latitude prov issues key scientificName datasetKey
#> <chr> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr>
#> 1 Seto… -96.7 32.9 gbif cdrou… 2550… Setophaga cae… 50c9509d-…
#> 2 Seto… -96.7 32.9 gbif gass8… 2557… Setophaga cae… 50c9509d-…
#> 3 Seto… -96.7 32.9 gbif gass8… 2563… Setophaga cae… 50c9509d-…
#> # … with 65 more variables: publishingOrgKey <chr>, installationKey <chr>,
#> # publishingCountry <chr>, protocol <chr>, lastCrawled <chr>,
#> # lastParsed <chr>, crawlId <int>, hostingOrganizationKey <chr>,
#> # basisOfRecord <chr>, occurrenceStatus <chr>, taxonKey <int>,
#> # kingdomKey <int>, phylumKey <int>, classKey <int>, orderKey <int>,
#> # familyKey <int>, genusKey <int>, speciesKey <int>, acceptedTaxonKey <int>,
#> # acceptedScientificName <chr>, kingdom <chr>, phylum <chr>, order <chr>,
#> # family <chr>, genus <chr>, species <chr>, genericName <chr>,
#> # specificEpithet <chr>, taxonRank <chr>, taxonomicStatus <chr>,
#> # dateIdentified <chr>, stateProvince <chr>, year <int>, month <int>,
#> # day <int>, eventDate <date>, modified <chr>, lastInterpreted <chr>,
#> # references <chr>, license <chr>, isInCluster <lgl>, geodeticDatum <chr>,
#> # class <chr>, countryCode <chr>, country <chr>, rightsHolder <chr>,
#> # identifier <chr>, `http://unknown.org/nick` <chr>, verbatimEventDate <chr>,
#> # datasetName <chr>, gbifID <chr>, verbatimLocality <chr>,
#> # collectionCode <chr>, occurrenceID <chr>, taxonID <chr>,
#> # catalogNumber <chr>, recordedBy <chr>,
#> # `http://unknown.org/occurrenceDetails` <chr>, institutionCode <chr>,
#> # rights <chr>, eventTime <chr>, identifiedBy <chr>, identificationID <chr>,
#> # coordinateUncertaintyInMeters <dbl>, occurrenceRemarks <chr>
Access BISON data
out$bison
#> Species [Setophaga caerulescens (3)]
#> First 10 rows of [Setophaga_caerulescens]
#>
#> # A tibble: 3 x 33
#> name longitude latitude prov providedScienti… date countryCode
#> <chr> <dbl> <dbl> <chr> <chr> <date> <chr>
#> 1 Dend… -82.5 48.2 bison Dendroica caeru… NA US
#> 2 Dend… -82.5 36.9 bison Dendroica caeru… NA US
#> 3 Dend… -82.6 45.8 bison Dendroica caeru… NA US
#> # … with 26 more variables: ambiguous <lgl>, generalComments <chr>,
#> # latlon <chr>, computedCountyFips <chr>, occurrenceID <chr>,
#> # basisOfRecord <chr>, providedCommonName <chr>, collectionID <chr>,
#> # ownerInstitutionCollectionCode <chr>, institutionID <chr>,
#> # computedStateFips <chr>, license <chr>, TSNs <chr>, providerID <int>,
#> # geo <chr>, provider <chr>, calculatedCounty <chr>,
#> # ITISscientificName <chr>, pointPath <chr>, kingdom <chr>,
#> # calculatedState <chr>, hierarchy_homonym_string <chr>, centroid <chr>,
#> # ITIScommonName <chr>, resourceID <chr>, ITIStsn <chr>
Access iNaturalist data
out$inat
#> Species [Setophaga caerulescens (3)]
#> First 10 rows of [Setophaga_caerulescens]
#>
#> # A tibble: 3 x 140
#> name longitude latitude prov quality_grade time_observed_at taxon_geoprivacy
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Seto… -83.1629… 41.6155… inat research 2017-05-18T10:3… open
#> 2 Seto… -80.2931… 25.7411… inat research 2020-12-11T15:3… open
#> 3 Seto… -81.8480… 42.3150… inat research 2020-05-23T15:1… open
#> # … with 133 more variables: annotations <list>, uuid <chr>, id <int>,
#> # cached_votes_total <int>, identifications_most_agree <lgl>,
#> # species_guess <chr>, identifications_most_disagree <lgl>, tags <chr>,
#> # positional_accuracy <int>, comments_count <int>, site_id <int>,
#> # created_time_zone <chr>, license_code <chr>, observed_time_zone <chr>,
#> # quality_metrics <list>, public_positional_accuracy <int>,
#> # reviewed_by <list>, oauth_application_id <int>, flags <list>,
#> # created_at <chr>, description <chr>, time_zone_offset <chr>,
#> # project_ids_with_curator_id <list>, observed_on <date>,
#> # observed_on_string <chr>, updated_at <chr>, sounds <list>,
#> # place_ids <list>, captive <lgl>, ident_taxon_ids <list>, outlinks <list>,
#> # faves_count <int>, ofvs <list>, num_identification_agreements <int>,
#> # comments <list>, map_scale <int>, uri <chr>, project_ids <list>,
#> # community_taxon_id <int>, owners_identification_from_vision <lgl>,
#> # identifications_count <int>, obscured <lgl>,
#> # num_identification_disagreements <int>, geoprivacy <chr>, location <chr>,
#> # votes <list>, spam <lgl>, mappable <lgl>, identifications_some_agree <lgl>,
#> # project_ids_without_curator_id <list>, place_guess <chr>,
#> # identifications <list>, project_observations <list>,
#> # observation_sounds <list>, photos <list>, observation_photos <list>,
#> # faves <list>, non_owner_ids <list>, observed_on_details.date <chr>,
#> # observed_on_details.week <int>, observed_on_details.month <int>,
#> # observed_on_details.hour <int>, observed_on_details.year <int>,
#> # observed_on_details.day <int>, created_at_details.date <chr>,
#> # created_at_details.week <int>, created_at_details.month <int>,
#> # created_at_details.hour <int>, created_at_details.year <int>,
#> # created_at_details.day <int>, taxon.is_active <lgl>, taxon.ancestry <chr>,
#> # taxon.min_species_ancestry <chr>, taxon.endemic <lgl>,
#> # taxon.iconic_taxon_id <int>, taxon.min_species_taxon_id <int>,
#> # taxon.threatened <lgl>, taxon.rank_level <int>, taxon.introduced <lgl>,
#> # taxon.native <lgl>, taxon.parent_id <int>, taxon.rank <chr>,
#> # taxon.extinct <lgl>, taxon.id <int>, taxon.ancestor_ids <list>,
#> # taxon.photos_locked <lgl>, taxon.taxon_schemes_count <int>,
#> # taxon.wikipedia_url <chr>, taxon.current_synonymous_taxon_ids <lgl>,
#> # taxon.created_at <chr>, taxon.taxon_changes_count <int>,
#> # taxon.complete_species_count <lgl>, taxon.universal_search_rank <int>,
#> # taxon.observations_count <int>, taxon.atlas_id <lgl>,
#> # taxon.complete_rank <chr>, taxon.iconic_taxon_name <chr>,
#> # taxon.preferred_common_name <chr>, taxon.flag_counts.unresolved <int>,
#> # taxon.flag_counts.resolved <int>, …