Chapter 3 Data sources

Data sources in fulltext include:

Crossref - via the rcrossref package
Public Library of Science (PLOS) - via the rplos package
Biomed Central
arXiv - via the aRxiv package
bioRxiv - via the biorxivr package
PMC/Pubmed via Entrez - via the rentrez package
Many more are supported via the above sources (e.g., Royal Society Open Science is available via Pubmed)
We will add more, as publishers open up, and as we have time…See the master list here

Data sources will differ by the task you are doing in fulltext.

3.1 Search

When searching with ft_search() you’ll have access to a specific set of sources and no others, including:

arxiv
biorxivr
bmc
crossref
entrez
europe_pmc
ma
plos
scopus

You can see what plugins there are with ft_search_ls()

3.2 Abstracts

When using ft_abstract() you have access to:

crossref
microsoft
plos
scopus
semanticscholar

You can see what plugins there are with ft_abstract_ls()

3.3 Links

When using ft_links() to get links to full text, you’ll have access to:

bmc
cdc
cogent
copernicus
crossref
elife
entrez
frontiersin
peerj
plos
rsoc

You can see what plugins there are with ft_links_ls()

3.4 Getting full text

While using ft_get() to fetch full text of articles you’ll have access to a set of specific data sources (in this case publishers) for which we have some coded plugins (i.e., functions):

aaas
aip
amersocclinoncol
amersocmicrobiol
arxiv
biorxiv
bmc
cambridge
cob
copernicus
crossref
elife
elsevier
entrez
frontiersin
ieee
informa
instinvestfil
jama
microbiology
peerj
pensoft
plos
pnas
royalsocchem
roysoc
sciencedirect
scientificsocieties
transtech
wiley

You can see what plugins there are with ft_get_ls()

But there are also other options within ft_get() that we take advantage of. This is because DOIs (Digital Object Identifiers) which you feed into ft_get() have a prefix that is affiliated with a specific publisher. We can then decide whether to use one of our plugins listed in ft_get_ls() or something else. If we don’t have a plugin we first look to see if Crossref has the full text link to either XML or PDF for the DOI. If not, we then go to an API rOpenSci maintains. This API has a set of rules for each publisher - some of which are simple rules like add a URL plus a DOI - but some require an HTTP request then some string manipulation.