Chapter 6 Abstract

Fetching abstracts likely will come after searching for articles with ft_search(). There are a few scenarios in which simply getting abstracts in lieu of full text may be enough.

For example, if you know that a large portion of the articles you want to mine text from are closed access and you don’t have access to them, you may have access to the abstracts depending on the publisher.

In addition, there are cases in which you really only need abstracts regardless of whether full text is available or not.

ft_abstract() gives you access to the following data sources:

crossref
microsoft
plos
scopus
semanticscholar

6.1 Usage

library(fulltext)

List data sources available

ft_abstract_ls()

#> [1] "crossref"        "microsoft"       "plos"            "scopus"         
#> [5] "semanticscholar"

Search - by default searches against PLOS (Public Library of Science)

res <- ft_search(query = "ecology")
(dois <- res$plos$data$id)

#>  [1] "10.1371/journal.pone.0001248" "10.1371/journal.pone.0059813"
#>  [3] "10.1371/journal.pone.0080763" "10.1371/journal.pone.0220747"
#>  [5] "10.1371/journal.pone.0155019" "10.1371/journal.pone.0175014"
#>  [7] "10.1371/journal.pone.0150648" "10.1371/journal.pone.0208370"
#>  [9] "10.1371/journal.pcbi.1003594" "10.1371/journal.pone.0102437"

Take the output of ft_search() and pass to ft_abstract():

out <- ft_abstract(dois)
out

#> <fulltext abstracts>
#> Found:
#>   [PLOS: 10; Scopus: 0; Microsoft: 0; Crossref: 0; Semantic Scholar: 0]

The output has slots for each data source:

names(out)

#> [1] "plos"            "scopus"          "ma"              "crossref"       
#> [5] "semanticscholar"

Index to the data source you want to get data from, here selecting the first item:

out$plos[[1]]

#> $doi
#> [1] "10.1371/journal.pone.0001248"
#> 
#> $abstract
#> [1] "Background: Soil ecology has produced a huge corpus of results on relations between soil organisms, ecosystem processes controlled by these organisms and links between belowground and aboveground processes. However, some soil scientists think that soil ecology is short of modelling and evolutionary approaches and has developed too independently from general ecology. We have tested quantitatively these hypotheses through a bibliographic study (about 23000 articles) comparing soil ecology journals, generalist ecology journals, evolutionary ecology journals and theoretical ecology journals. Findings: We have shown that soil ecology is not well represented in generalist ecology journals and that soil ecologists poorly use modelling and evolutionary approaches. Moreover, the articles published by a typical soil ecology journal (Soil Biology and Biochemistry) are cited by and cite low percentages of articles published in generalist ecology journals, evolutionary ecology journals and theoretical ecology journals. Conclusion: This confirms our hypotheses and suggests that soil ecology would benefit from an effort towards modelling and evolutionary approaches. This effort should promote the building of a general conceptual framework for soil ecology and bridges between soil ecology and general ecology. We give some historical reasons for the parsimonious use of modelling and evolutionary approaches by soil ecologists. We finally suggest that a publication system that classifies journals according to their Impact Factors and their level of generality is probably inadequate to integrate “particularity” (empirical observations) and “generality” (general theories), which is the goal of all natural sciences. Such a system might also be particularly detrimental to the development of a science such as ecology that is intrinsically multidisciplinary. "

Which gives a named list, with the DOI as the first element, then the abstract as a single character string.

You can then take these abstracts and use any number of R packages for text mining.