Chapter 3 Data sources

Data sources in fulltext include:

Data sources will differ by the task you are doing in fulltext.

3.2 Abstracts

When using ft_abstract() you have access to:

  • crossref
  • microsoft
  • plos
  • scopus
  • semanticscholar

You can see what plugins there are with ft_abstract_ls()

3.4 Getting full text

While using ft_get() to fetch full text of articles you’ll have access to a set of specific data sources (in this case publishers) for which we have some coded plugins (i.e., functions):

  • aaas
  • aip
  • amersocclinoncol
  • amersocmicrobiol
  • arxiv
  • biorxiv
  • bmc
  • cambridge
  • cob
  • copernicus
  • crossref
  • elife
  • elsevier
  • entrez
  • frontiersin
  • ieee
  • informa
  • instinvestfil
  • jama
  • microbiology
  • peerj
  • pensoft
  • plos
  • pnas
  • royalsocchem
  • roysoc
  • sciencedirect
  • scientificsocieties
  • transtech
  • wiley

You can see what plugins there are with ft_get_ls()

But there are also other options within ft_get() that we take advantage of. This is because DOIs (Digital Object Identifiers) which you feed into ft_get() have a prefix that is affiliated with a specific publisher. We can then decide whether to use one of our plugins listed in ft_get_ls() or something else. If we don’t have a plugin we first look to see if Crossref has the full text link to either XML or PDF for the DOI. If not, we then go to an API rOpenSci maintains. This API has a set of rules for each publisher - some of which are simple rules like add a URL plus a DOI - but some require an HTTP request then some string manipulation.