Chapter 3 spocc

3.1 What is spocc?

spocc is an R package that is a single user interface to occurrence data.

spocc currently interfaces with ten major biodiversity repositories

  1. Global Biodiversity Information Facility (GBIF) (via rgbif)
  2. Berkeley Ecoengine (via ecoengine)
  3. iNaturalist
  4. VertNet (via rvertnet)
  5. Biodiversity Information Serving Our Nation (via rbison)
  6. eBird (via rebird)
  7. AntWeb (via AntWeb)
  8. iDigBio (via ridigbio)
  9. OBIS
  10. Atlas of Living Australia

3.2 Basic example

Load spocc and request data for a given taxonomic name, for each of three different data sources, specify options across all data sources (e.g., limit number of records), and options for specific data sources (using gbifopts = list(), etc. for each data source).

library(spocc)
out <- occ(query = 'Setophaga caerulescens', from = c('gbif','bison','inat'), limit = 3, gbifopts = list(country = 'US'))

Access GBIF data

out$gbif
#> Species [Setophaga caerulescens (3)] 
#> First 10 rows of [Setophaga_caerulescens]
#> 
#> # A tibble: 3 x 73
#>   name  longitude latitude prov  issues key   scientificName datasetKey
#>   <chr>     <dbl>    <dbl> <chr> <chr>  <chr> <chr>          <chr>     
#> 1 Seto…     -96.7     32.9 gbif  cdrou… 2550… Setophaga cae… 50c9509d-…
#> 2 Seto…     -96.7     32.9 gbif  gass8… 2557… Setophaga cae… 50c9509d-…
#> 3 Seto…     -96.7     32.9 gbif  gass8… 2563… Setophaga cae… 50c9509d-…
#> # … with 65 more variables: publishingOrgKey <chr>, installationKey <chr>,
#> #   publishingCountry <chr>, protocol <chr>, lastCrawled <chr>,
#> #   lastParsed <chr>, crawlId <int>, hostingOrganizationKey <chr>,
#> #   basisOfRecord <chr>, occurrenceStatus <chr>, taxonKey <int>,
#> #   kingdomKey <int>, phylumKey <int>, classKey <int>, orderKey <int>,
#> #   familyKey <int>, genusKey <int>, speciesKey <int>, acceptedTaxonKey <int>,
#> #   acceptedScientificName <chr>, kingdom <chr>, phylum <chr>, order <chr>,
#> #   family <chr>, genus <chr>, species <chr>, genericName <chr>,
#> #   specificEpithet <chr>, taxonRank <chr>, taxonomicStatus <chr>,
#> #   dateIdentified <chr>, stateProvince <chr>, year <int>, month <int>,
#> #   day <int>, eventDate <date>, modified <chr>, lastInterpreted <chr>,
#> #   references <chr>, license <chr>, isInCluster <lgl>, geodeticDatum <chr>,
#> #   class <chr>, countryCode <chr>, country <chr>, rightsHolder <chr>,
#> #   identifier <chr>, `http://unknown.org/nick` <chr>, verbatimEventDate <chr>,
#> #   datasetName <chr>, gbifID <chr>, verbatimLocality <chr>,
#> #   collectionCode <chr>, occurrenceID <chr>, taxonID <chr>,
#> #   catalogNumber <chr>, recordedBy <chr>,
#> #   `http://unknown.org/occurrenceDetails` <chr>, institutionCode <chr>,
#> #   rights <chr>, eventTime <chr>, identifiedBy <chr>, identificationID <chr>,
#> #   coordinateUncertaintyInMeters <dbl>, occurrenceRemarks <chr>

Access BISON data

out$bison
#> Species [Setophaga caerulescens (3)] 
#> First 10 rows of [Setophaga_caerulescens]
#> 
#> # A tibble: 3 x 33
#>   name  longitude latitude prov  providedScienti… date       countryCode
#>   <chr>     <dbl>    <dbl> <chr> <chr>            <date>     <chr>      
#> 1 Dend…     -82.5     48.2 bison Dendroica caeru… NA         US         
#> 2 Dend…     -82.5     36.9 bison Dendroica caeru… NA         US         
#> 3 Dend…     -82.6     45.8 bison Dendroica caeru… NA         US         
#> # … with 26 more variables: ambiguous <lgl>, generalComments <chr>,
#> #   latlon <chr>, computedCountyFips <chr>, occurrenceID <chr>,
#> #   basisOfRecord <chr>, providedCommonName <chr>, collectionID <chr>,
#> #   ownerInstitutionCollectionCode <chr>, institutionID <chr>,
#> #   computedStateFips <chr>, license <chr>, TSNs <chr>, providerID <int>,
#> #   geo <chr>, provider <chr>, calculatedCounty <chr>,
#> #   ITISscientificName <chr>, pointPath <chr>, kingdom <chr>,
#> #   calculatedState <chr>, hierarchy_homonym_string <chr>, centroid <chr>,
#> #   ITIScommonName <chr>, resourceID <chr>, ITIStsn <chr>

Access iNaturalist data

out$inat
#> Species [Setophaga caerulescens (3)] 
#> First 10 rows of [Setophaga_caerulescens]
#> 
#> # A tibble: 3 x 140
#>   name  longitude latitude prov  quality_grade time_observed_at taxon_geoprivacy
#>   <chr> <chr>     <chr>    <chr> <chr>         <chr>            <chr>           
#> 1 Seto… -83.1629… 41.6155… inat  research      2017-05-18T10:3… open            
#> 2 Seto… -80.2931… 25.7411… inat  research      2020-12-11T15:3… open            
#> 3 Seto… -81.8480… 42.3150… inat  research      2020-05-23T15:1… open            
#> # … with 133 more variables: annotations <list>, uuid <chr>, id <int>,
#> #   cached_votes_total <int>, identifications_most_agree <lgl>,
#> #   species_guess <chr>, identifications_most_disagree <lgl>, tags <chr>,
#> #   positional_accuracy <int>, comments_count <int>, site_id <int>,
#> #   created_time_zone <chr>, license_code <chr>, observed_time_zone <chr>,
#> #   quality_metrics <list>, public_positional_accuracy <int>,
#> #   reviewed_by <list>, oauth_application_id <int>, flags <list>,
#> #   created_at <chr>, description <chr>, time_zone_offset <chr>,
#> #   project_ids_with_curator_id <list>, observed_on <date>,
#> #   observed_on_string <chr>, updated_at <chr>, sounds <list>,
#> #   place_ids <list>, captive <lgl>, ident_taxon_ids <list>, outlinks <list>,
#> #   faves_count <int>, ofvs <list>, num_identification_agreements <int>,
#> #   comments <list>, map_scale <int>, uri <chr>, project_ids <list>,
#> #   community_taxon_id <int>, owners_identification_from_vision <lgl>,
#> #   identifications_count <int>, obscured <lgl>,
#> #   num_identification_disagreements <int>, geoprivacy <chr>, location <chr>,
#> #   votes <list>, spam <lgl>, mappable <lgl>, identifications_some_agree <lgl>,
#> #   project_ids_without_curator_id <list>, place_guess <chr>,
#> #   identifications <list>, project_observations <list>,
#> #   observation_sounds <list>, photos <list>, observation_photos <list>,
#> #   faves <list>, non_owner_ids <list>, observed_on_details.date <chr>,
#> #   observed_on_details.week <int>, observed_on_details.month <int>,
#> #   observed_on_details.hour <int>, observed_on_details.year <int>,
#> #   observed_on_details.day <int>, created_at_details.date <chr>,
#> #   created_at_details.week <int>, created_at_details.month <int>,
#> #   created_at_details.hour <int>, created_at_details.year <int>,
#> #   created_at_details.day <int>, taxon.is_active <lgl>, taxon.ancestry <chr>,
#> #   taxon.min_species_ancestry <chr>, taxon.endemic <lgl>,
#> #   taxon.iconic_taxon_id <int>, taxon.min_species_taxon_id <int>,
#> #   taxon.threatened <lgl>, taxon.rank_level <int>, taxon.introduced <lgl>,
#> #   taxon.native <lgl>, taxon.parent_id <int>, taxon.rank <chr>,
#> #   taxon.extinct <lgl>, taxon.id <int>, taxon.ancestor_ids <list>,
#> #   taxon.photos_locked <lgl>, taxon.taxon_schemes_count <int>,
#> #   taxon.wikipedia_url <chr>, taxon.current_synonymous_taxon_ids <lgl>,
#> #   taxon.created_at <chr>, taxon.taxon_changes_count <int>,
#> #   taxon.complete_species_count <lgl>, taxon.universal_search_rank <int>,
#> #   taxon.observations_count <int>, taxon.atlas_id <lgl>,
#> #   taxon.complete_rank <chr>, taxon.iconic_taxon_name <chr>,
#> #   taxon.preferred_common_name <chr>, taxon.flag_counts.unresolved <int>,
#> #   taxon.flag_counts.resolved <int>, …