Chapter 8 Links
The ft_links
function makes it easy to get URLs for full text versions of articles. You can for instance only use fulltext
to pass DOIs directly to ft_links
to get URLs to use elsewhere in your research workflow. Or you may want to search first with ft_search
, then pass that output directly to ft_links
.
8.1 Usage
library(fulltext)
List backends available
ft_links_ls()
#> [1] "bmc" "cdc" "cogent" "copernicus" "crossref"
#> [6] "elife" "entrez" "frontiersin" "peerj" "plos"
#> [11] "rsoc"
You can pass DOIs directly to ft_links
<- ft_links('10.3389/fphar.2014.00109')
res res
#> <fulltext links>
#> [Found] 1
#> [IDs] 10.3389/fphar.2014.00109 ...
The output is an S3 object, essentially a list. If you don’t specify a from
value, we try to guess the publisher and the named list in the output will match the publisher of the DOI. Here, that’s Frontiers In, lowercased and as one word:
$frontiersin res
#> $found
#> [1] 1
#>
#> $ids
#> [1] "10.3389/fphar.2014.00109"
#>
#> $data
#> $data$`10.3389/fphar.2014.00109`
#> $data$`10.3389/fphar.2014.00109`$xml
#> [1] "http://journal.frontiersin.org/article/10.3389/fphar.2014.00109/xml/nlm"
#>
#> $data$`10.3389/fphar.2014.00109`$pdf
#> [1] "http://journal.frontiersin.org/article/10.3389/fphar.2014.00109/pdf"
The output is a named list with number of links found, the id (aka DOI), and in the $data
slot is the links, which can include links for pdf, xml, plain (for plain text), unspecified and possibly others (publishers do lots of weird things).
Instead of passing DOIs directly, you can use ft_search(()
to search first:
<- ft_search(query='ecology', from='entrez')) (res1
#> Query:
#> [ecology]
#> Found:
#> [PLoS: 0; BMC: 0; Crossref: 0; Entrez: 248883; arxiv: 0; biorxiv: 0; Europe PMC: 0; Scopus: 0; Microsoft: 0]
#> Returned:
#> [PLoS: 0; BMC: 0; Crossref: 0; Entrez: 10; arxiv: 0; biorxiv: 0; Europe PMC: 0; Scopus: 0; Microsoft: 0]
Then pass the output of that directly to ft_links
<- ft_links(res1)) (out
#> <fulltext links>
#> [Found] 5
#> [IDs] ID_34597326 ID_34597314 ID_34597310 ID_34529679 ID_32360945 ...
Here, the output name on the list matches the source passed in to ft_links
from ft_search
.
names(out)
#> [1] "entrez"
You can alternatively pass in DOIs directly and specify the data source. Options include “plos”, “bmc”, “crossref”, and “entrez”.
<- c("10.1371/journal.pone.0017342", "10.1371/journal.pone.0091497")
x <- ft_links(x, from = "plos")
z $plos$data z
#> $`10.1371/journal.pone.0017342`
#> $`10.1371/journal.pone.0017342`$xml
#> [1] "http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0017342&type=manuscript"
#>
#> $`10.1371/journal.pone.0017342`$pdf
#> [1] "http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0017342&type=printable"
#>
#>
#> $`10.1371/journal.pone.0091497`
#> $`10.1371/journal.pone.0091497`$xml
#> [1] "http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0091497&type=manuscript"
#>
#> $`10.1371/journal.pone.0091497`$pdf
#> [1] "http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0091497&type=printable"
Fetch just the pdf links
unname(vapply(z$plos$data, "[[", "", "pdf"))
#> [1] "http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0017342&type=printable"
#> [2] "http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0091497&type=printable"
8.2 Links options
All data sources for ft_links()
SHOULD accept configuration options BUT that does not work right now. Fix coming, see https://github.com/ropensci/fulltext/issues/223
As all functions in fulltext
, you can pass on curl options to each function call or set them globally for the session, see the curl options chapter.