2 HTTP in R 101

2.1 What is HTTP?

HTTP means HyperText Transport Protocol, but you were probably not just looking for a translation of the abbreviation. HTTP is a way for you to exchange information with a remote server. In your package, if information is going back and forth between the R session and internet, you are using some sort of HTTP tooling. Your package is making requests and receives responses.

2.1.1 HTTP requests

The HTTP request is what your package makes. It has a method (are you fetching information via GET? are you sending information via POST?), different parts of an URL (domain, endpoint, query string), headers (containing e.g. your secret identifiers). It can contain a body, for instance you might be sending data as JSON. In that case one of the header will describe the content.

How do you know what request to make from your package? Hopefully you are interacting with a well documented web resource that will explain to you what methods are associated with what endpoints.

2.1.2 HTTP responses

The HTTP response is what the remote server provides, and what your package parses. A response has a status code indicating whether the request succeeded, response headers, and (optionally) a response body.

Hopefully the documentation of the web API or web resource you are working with shows good examples of responses. In any case you’ll find yourself experimenting with different requests to see what the response “looks like”.

2.1.3 More resources about HTTP

How do you get started with interacting with HTTP in R?

2.1.3.1 General HTTP resources

2.1.4 HTTP with R

  • The docs of the R package you end up choosing!
  • Digging into the source code of another package that does similar things.

2.2 HTTP requests in R: what package?

In R, to interact with web resources, it is recommended to use curl; or its higher-level interfaces httr (pronounced hitter or h-t-t-r) or crul.

Do not use RCurl, because it is not actively maintained!

When writing a package interacting with web resources, you will probably use either httr or crul.

Below we will try to programmatically access the status of GitHub, the open-source platfrom provided by the company of the same name. We will access the same information with httr and crul. If you decide for the low-level curl, feel free to contribute an example.

github_url <- "https://kctbh9vrtdwd.statuspage.io/api/v2/status.json"

The URL above leaves no doubt as to what format the data is provided in, JSON!

Let’s first use httr.

response <- httr::GET(github_url)

# Check the response status
httr::http_status(response)
## $category
## [1] "Success"
## 
## $reason
## [1] "OK"
## 
## $message
## [1] "Success: (200) OK"
# Or in a package you'd just write
httr::stop_for_status(response)

# Parse the content
httr::content(response)
## $page
## $page$id
## [1] "kctbh9vrtdwd"
## 
## $page$name
## [1] "GitHub"
## 
## $page$url
## [1] "https://www.githubstatus.com"
## 
## $page$time_zone
## [1] "Etc/UTC"
## 
## $page$updated_at
## [1] "2021-04-01T23:00:14.985Z"
## 
## 
## $status
## $status$indicator
## [1] "none"
## 
## $status$description
## [1] "All Systems Operational"
# In case you wonder, the format was obtained from a header
httr::headers(response)$`content-type`
## [1] "application/json; charset=utf-8"

Now, the same with crul.

# Create a client and get a response
client <- crul::HttpClient$new(github_url)
response <- client$get()


# Check the response status
response$status_http()
## <Status code: 200>
##   Message: OK
##   Explanation: Request fulfilled, document follows
# Or in a package you'd just write
response$raise_for_status()

# Parse the content
response$parse()
## No encoding supplied: defaulting to UTF-8.
## [1] "{\"page\":{\"id\":\"kctbh9vrtdwd\",\"name\":\"GitHub\",\"url\":\"https://www.githubstatus.com\",\"time_zone\":\"Etc/UTC\",\"updated_at\":\"2021-04-01T23:00:14.985Z\"},\"status\":{\"indicator\":\"none\",\"description\":\"All Systems Operational\"}}"
jsonlite::fromJSON(response$parse())
## No encoding supplied: defaulting to UTF-8.
## $page
## $page$id
## [1] "kctbh9vrtdwd"
## 
## $page$name
## [1] "GitHub"
## 
## $page$url
## [1] "https://www.githubstatus.com"
## 
## $page$time_zone
## [1] "Etc/UTC"
## 
## $page$updated_at
## [1] "2021-04-01T23:00:14.985Z"
## 
## 
## $status
## $status$indicator
## [1] "none"
## 
## $status$description
## [1] "All Systems Operational"

Hopefully these very short snippets give you an idea of what syntax to expect when choosing one of those packages.

Note that the choice of a package will constrain the HTTP testing tools you can use. However, the general ideas will remain the same. You could switch your package backend from say crul to httr without changing your tests, if your tests do not test too many specifities of internals.