Chapter 3 Data sources
Data sources in fulltext
include:
- Crossref - via the
rcrossref
package - Public Library of Science (PLOS) - via the
rplos
package - Biomed Central
- arXiv - via the
aRxiv
package - bioRxiv - via the
biorxivr
package - PMC/Pubmed via Entrez - via the
rentrez
package - Many more are supported via the above sources (e.g., Royal Society Open Science is available via Pubmed)
- We will add more, as publishers open up, and as we have time…See the master list here
Data sources will differ by the task you are doing in fulltext
.
3.1 Search
When searching with ft_search()
you’ll have access to a specific set of sources and no others, including:
- arxiv
- biorxivr
- bmc
- crossref
- entrez
- europe_pmc
- ma
- plos
- scopus
You can see what plugins there are with ft_search_ls()
3.2 Abstracts
When using ft_abstract()
you have access to:
- crossref
- microsoft
- plos
- scopus
- semanticscholar
You can see what plugins there are with ft_abstract_ls()
3.3 Links
When using ft_links()
to get links to full text, you’ll have access to:
- bmc
- cdc
- cogent
- copernicus
- crossref
- elife
- entrez
- frontiersin
- peerj
- plos
- rsoc
You can see what plugins there are with ft_links_ls()
3.4 Getting full text
While using ft_get()
to fetch full text of articles you’ll have access to a set of specific data sources (in this case publishers) for which we have some coded plugins (i.e., functions):
- aaas
- aip
- amersocclinoncol
- amersocmicrobiol
- arxiv
- biorxiv
- bmc
- cambridge
- cob
- copernicus
- crossref
- elife
- elsevier
- entrez
- frontiersin
- ieee
- informa
- instinvestfil
- jama
- microbiology
- peerj
- pensoft
- plos
- pnas
- royalsocchem
- roysoc
- sciencedirect
- scientificsocieties
- transtech
- wiley
You can see what plugins there are with ft_get_ls()
But there are also other options within ft_get()
that we take advantage of. This is because DOIs (Digital Object Identifiers) which you feed into ft_get()
have a prefix that is affiliated with a specific publisher. We can then decide whether to use one of our plugins listed in ft_get_ls()
or something else. If we don’t have a plugin we first look to see if Crossref has the full text link to either XML or PDF for the DOI. If not, we then go to an API rOpenSci maintains. This API has a set of rules for each publisher - some of which are simple rules like add a URL plus a DOI - but some require an HTTP request then some string manipulation.