• fulltext manual
  • 1 fulltext manual
    • 1.1 Info
    • 1.2 Citing fulltext
    • 1.3 Installation
  • 2 Introduction
    • 2.1 User interface
  • 3 Data sources
    • 3.1 Search
    • 3.2 Abstracts
    • 3.3 Links
    • 3.4 Getting full text
  • 4 Authentication
  • 5 Rate limits
  • 6 Search
    • 6.1 Usage
    • 6.2 Search many sources
    • 6.3 Search options
  • 7 Abstracts
    • 7.1 Usage
    • 7.2 By Ids
    • 7.3 Abstracts options
  • 8 Links
    • 8.1 Usage
    • 8.2 Links options
  • 9 Fetch
    • 9.1 Data formats
    • 9.2 How data is stored
    • 9.3 Usage
    • 9.4 Errors
    • 9.5 Cleanup
    • 9.6 Internals
    • 9.7 Notes about specific data sources
      • 9.7.1 Elsevier
  • 10 Extracting text
    • 10.1 Usage
    • 10.2 Tabularize
    • 10.3 Other inputs
      • 10.3.1 Files
      • 10.3.2 xml in a string
      • 10.3.3 xml2 objects
  • 11 Summarize articles on disk
    • 11.1 Usage
  • 12 Request debugging/inspection
  • 13 Use cases
  • 14 session info
  • (c) Scott Chamberlain, 2020

fulltext manual

Chapter 8 Links

The ft_links function makes it easy to get URLs for full text versions of articles. You can for instance only use fulltext to pass DOIs directly to ft_links to get URLs to use elsewhere in your research workflow. Or you may want to search first with ft_search, then pass that output directly to ft_links.

8.1 Usage

library(fulltext)

List backends available

ft_links_ls()
#>  [1] "bmc"         "cdc"         "cogent"      "copernicus"  "crossref"   
#>  [6] "elife"       "entrez"      "frontiersin" "peerj"       "plos"       
#> [11] "rsoc"

You can pass DOIs directly to ft_links

res <- ft_links('10.3389/fphar.2014.00109')
res
#> <fulltext links>
#> [Found] 1 
#> [IDs] 10.3389/fphar.2014.00109 ...

The output is an S3 object, essentially a list. If you don’t specify a from value, we try to guess the publisher and the named list in the output will match the publisher of the DOI. Here, that’s Frontiers In, lowercased and as one word:

res$frontiersin
#> $found
#> [1] 1
#> 
#> $ids
#> [1] "10.3389/fphar.2014.00109"
#> 
#> $data
#> $data$`10.3389/fphar.2014.00109`
#> $data$`10.3389/fphar.2014.00109`$xml
#> [1] "http://journal.frontiersin.org/article/10.3389/fphar.2014.00109/xml/nlm"
#> 
#> $data$`10.3389/fphar.2014.00109`$pdf
#> [1] "http://journal.frontiersin.org/article/10.3389/fphar.2014.00109/pdf"

The output is a named list with number of links found, the id (aka DOI), and in the $data slot is the links, which can include links for pdf, xml, plain (for plain text), unspecified and possibly others (publishers do lots of weird things).

Instead of passing DOIs directly, you can use ft_search(() to search first:

(res1 <- ft_search(query='ecology', from='entrez'))
#> Query:
#>   [ecology] 
#> Found:
#>   [PLoS: 0; BMC: 0; Crossref: 0; Entrez: 248883; arxiv: 0; biorxiv: 0; Europe PMC: 0; Scopus: 0; Microsoft: 0] 
#> Returned:
#>   [PLoS: 0; BMC: 0; Crossref: 0; Entrez: 10; arxiv: 0; biorxiv: 0; Europe PMC: 0; Scopus: 0; Microsoft: 0]

Then pass the output of that directly to ft_links

(out <- ft_links(res1))
#> <fulltext links>
#> [Found] 5 
#> [IDs] ID_34597326 ID_34597314 ID_34597310 ID_34529679 ID_32360945 ...

Here, the output name on the list matches the source passed in to ft_links from ft_search.

names(out)
#> [1] "entrez"

You can alternatively pass in DOIs directly and specify the data source. Options include “plos”, “bmc”, “crossref”, and “entrez”.

x <- c("10.1371/journal.pone.0017342", "10.1371/journal.pone.0091497")
z <- ft_links(x, from = "plos")
z$plos$data
#> $`10.1371/journal.pone.0017342`
#> $`10.1371/journal.pone.0017342`$xml
#> [1] "http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0017342&type=manuscript"
#> 
#> $`10.1371/journal.pone.0017342`$pdf
#> [1] "http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0017342&type=printable"
#> 
#> 
#> $`10.1371/journal.pone.0091497`
#> $`10.1371/journal.pone.0091497`$xml
#> [1] "http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0091497&type=manuscript"
#> 
#> $`10.1371/journal.pone.0091497`$pdf
#> [1] "http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0091497&type=printable"

Fetch just the pdf links

unname(vapply(z$plos$data, "[[", "", "pdf"))
#> [1] "http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0017342&type=printable"
#> [2] "http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0091497&type=printable"

8.2 Links options

All data sources for ft_links() SHOULD accept configuration options BUT that does not work right now. Fix coming, see https://github.com/ropensci/fulltext/issues/223

As all functions in fulltext, you can pass on curl options to each function call or set them globally for the session, see the curl options chapter.