2 HTTP in R 101
2.1 What is HTTP?
HTTP means HyperText Transport Protocol, but you were probably not just looking for a translation of the abbreviation. HTTP is a way for you to exchange information with a remote server. In your package, if information is going back and forth between the R session and the internet, you are using some sort of HTTP tooling. Your package is making requests and receives responses.
2.1.1 HTTP requests
The HTTP request is what your package makes.
It has a method (are you fetching information via GET
? are you sending information via POST
?), different parts of a URL (domain, endpoint, query string), and headers (containing for instance your secret identifiers).
It can contain a body. For instance, you might be sending data as JSON.
In that case one of the headers will describe the content.
How do you know what request to make from your package? Hopefully you are interacting with a well documented web resource that will explain to you what methods are associated with what endpoints.
2.1.2 HTTP responses
The HTTP response is what the remote server provides, and what your package parses. A response has a status code indicating whether the request succeeded, response headers, and (optionally) a response body.
Hopefully the documentation of the web API or web resource you are working with shows good examples of responses. In any case you’ll find yourself experimenting with different requests to see what the response “looks like”.
2.1.3 More resources about HTTP
How do you get started with interacting with HTTP in R?
2.1.3.1 General HTTP resources
- Mozilla Developer Network docs about HTTP (recommended in the zine mentioned hereafter)
- (not free) Julia Evans’ Zine “HTTP: Learn your browser’s language!”
- The docs of the web API you are aiming to work with, and a search engine to understand the words that are new.
2.2 HTTP requests in R: what package?
In R, to interact with web resources, it is recommended to use curl; or its higher-level interfaces httr (pronounced hitter or h-t-t-r), httr2 or crul.
Do not use RCurl, because it is not actively maintained!
When writing a package interacting with web resources, you will probably use httr2, httr or crul.
httr is the most popular and oldest of the three packages, and supports OAuth. httr docs feature a vignette called Best practices for API packages
httr2 “is a ground-up rewrite of httr that provides a pipeable API with an explicit request object that solves more problems felt by packages that wrap APIs (e.g. built-in rate-limiting, retries, OAuth, secure secrets, and more)” so it might be a good idea to adopt it rather than httr for a new package. It has a vignette about Wrapping APIs.
crul does not support OAuth but it uses an object-oriented interface, which you might like. crul has a set of clients, or ways to perform requests, that might be handy. crul also has a vignette about API package best practices.
Below we will try to programmatically access the status of GitHub, the open-source platform provided by the company of the same name. We will access the same information with httr2 and crul If you decide to try the low-level curl, feel free to contribute an example. The internet has enough examples for httr.
github_url <- "https://kctbh9vrtdwd.statuspage.io/api/v2/status.json"
The URL above leaves no doubt as to what format the data is provided in, JSON!
Let’s first use httr2.
library("magrittr")
response <- httr2::request(github_url) %>%
httr2::req_perform()
# Check the response status
httr2::resp_status(response)
## [1] 200
# Or in a package you'd write
httr2::resp_check_status(response)
# Parse the content
httr2::resp_body_json(response)
## $page
## $page$id
## [1] "kctbh9vrtdwd"
##
## $page$name
## [1] "GitHub"
##
## $page$url
## [1] "https://www.githubstatus.com"
##
## $page$time_zone
## [1] "Etc/UTC"
##
## $page$updated_at
## [1] "2024-09-06T07:43:58.589Z"
##
##
## $status
## $status$indicator
## [1] "none"
##
## $status$description
## [1] "All Systems Operational"
# In case you wonder, the format was obtained from a header
httr2::resp_header(response, "content-type")
## [1] "application/json; charset=utf-8"
Now, the same with crul.
# Create a client and get a response
client <- crul::HttpClient$new(github_url)
response <- client$get()
# Check the response status
response$status_http()
## <Status code: 200>
## Message: OK
## Explanation: Request fulfilled, document follows
# Or in a package you'd write
response$raise_for_status()
# Parse the content
response$parse()
## No encoding supplied: defaulting to UTF-8.
## [1] "{\"page\":{\"id\":\"kctbh9vrtdwd\",\"name\":\"GitHub\",\"url\":\"https://www.githubstatus.com\",\"time_zone\":\"Etc/UTC\",\"updated_at\":\"2024-09-06T07:43:58.589Z\"},\"status\":{\"indicator\":\"none\",\"description\":\"All Systems Operational\"}}"
jsonlite::fromJSON(response$parse())
## No encoding supplied: defaulting to UTF-8.
## $page
## $page$id
## [1] "kctbh9vrtdwd"
##
## $page$name
## [1] "GitHub"
##
## $page$url
## [1] "https://www.githubstatus.com"
##
## $page$time_zone
## [1] "Etc/UTC"
##
## $page$updated_at
## [1] "2024-09-06T07:43:58.589Z"
##
##
## $status
## $status$indicator
## [1] "none"
##
## $status$description
## [1] "All Systems Operational"
Hopefully these very short snippets give you an idea of what syntax to expect when choosing one of these packages.
Note that the choice of a package will constrain the HTTP testing tools you can use. However, the general ideas will remain the same. You could switch your package backend from, say, crul to httr without changing your tests, if your tests do not test too many specificities of internals.