22 Caching HTTP requests

Record HTTP calls and replay them

22.1 Package documentation

Check out https://docs.ropensci.org/vcr/ for documentation on vcr functions.

22.2 Terminology

  • vcr: the name comes from the idea that we want to record something and play it back later, like a vcr
  • cassette: A thing to record HTTP interactions to. Right now the only option is the file system (writing to files), but in the future could be other things, e.g. a key-value store like Redis
  • fixture: A fixture is something used to consistently test a piece of software. In this case, a cassette (just defined above) is a fixture - used in unit tests. If you use our setup function vcr_setup() the default directory created to hold cassettes is called fixtures/ as a signal as to what the folder contains.
  • Persisters: how to save requests - currently only option is the file system
  • serialize: translating data into a format that can be stored; here, translate HTTP request and response data into a representation on disk to read back later
  • Serializers: how to serialize the HTTP response - currently only option is YAML; other options in the future could include e.g. JSON
  • insert cassette: create a cassette (all HTTP interactions will be recorded to this cassette)
  • eject cassette: eject the cassette (no longer recording to that cassette)
  • replay: refers to using a cached result of an http request that was recorded earlier

22.3 Design

This section explains vcr’s internal design and architecture.

22.3.1 Where vcr comes from and why R6

vcr was “ported” from the Ruby gem (aka, library) of the same name6. Because it was ported from Ruby, an object-oriented programming language I thought it would be easier to use an object system in R that most closely resemble that used in Ruby (at least in my opinion). This thinking lead to choosing R6. The exported functions users interact with are not R6 classes, but are rather normal R functions. However, most of the internal code in the package uses R6. Thus, familiarity with R6 is important for people that may want to contribute to vcr, but not required at all for vcr users.

22.3.2 Principles

22.3.2.1 An easy to use interface hides complexity

As described above, vcr uses R6 internally, but users interact with normal R functions. Internal functions that are quite complicated are largely R6 while exported, simpler functions users interact with are normal R functions.

22.3.2.2 Class/function names are inherited from Ruby vcr

Since R vcr was ported from Ruby, we kept most of the names of functions/classes and variables. So if you’re wondering about why a function, class, or variable has a particular name, its derivation can not be found out in this package, for the most part that is.

22.3.2.3 Hooks into HTTP clients

Perhaps the most fundamental thing about that this package work is how it knows what HTTP requests are being made. This stumped me for quite a long time. When looking at Ruby vcr, at first I thought it must be “listening” for HTTP requests somehow. Then I found out about monkey patching; that’s how it’s achieved in Ruby. That is, the Ruby vcr package literally overrides certain methods in Ruby HTTP clients, hijacking internals of the HTTP clients.

However, monkey patching is not allowed in R. Thus, in R we have to somehow have “hooks” into HTTP clients in R. Fortunately, Scott is the maintainer of one of the HTTP clients, crul, so was able to quickly create a hook. Very fortunately, there was already a hook mechanism in the httr package.

The actual hooks are not in vcr, but in webmockr. vcr depends on webmockr for hooking into HTTP clients httr and crul.

22.3.3 Internal classes

An overview of some of the more important aspects of vcr.

22.3.3.1 Configuration

An internal object (vcr_c) is created when vcr is loaded with the default vcr configuration options inside of an R6 class VCRConfig - see https://github.com/ropensci/vcr/blob/main/R/onLoad.R. This class is keeps track of default and user specified configuration options. You can access vcr_c using triple namespace :::, though it is not intended for general use. Whenever you make calls to vcr_configure() or other configuration functions, vcr_c is affected.

22.3.3.2 Cassette class

Cassette is an R6 class that handles internals/state for each cassette. Each time you run use_cassette() this class is used. The class has quite a few methods in it, so there’s a lot going on in the class. Ideally the class would be separated into subclasses to handle similar sets of logic, but there’s not an easy way to do that with R6.

Of note in Cassette is that when called, within the initialize() call webmockr is used to create webmockr stubs.

22.3.3.3 How HTTP requests are handled

Within webmockr, there are calls to the vcr class RequestHandler, which has child classes RequestHandlerCrul and RequestHandlerHttr for crul and httr, respectively. These classes determine what to do with each HTTP request. The options for each HTTP request include:

  • Ignored You can ignore HTTP requests under certain rules using the configuration options ignore_hosts and ignore_localhost
  • Stubbed by vcr This is an HTTP request for which a match is found in the cassette defined in the use_cassette()/insert_cassette() call. In this case the matching request/response from the cassette is returned with no real HTTP request allowed.
  • Recordable This is an HTTP request for which no match is found in the cassette defined in the use_cassette()/insert_cassette() call. In this case a real HTTP request is allowed, and the request/response is recorded to the cassette.
  • Unhandled This is a group of cases, all of which cause an error to be thrown with a message trying to help the user figure out how to fix the problem.

If you use vcr logging you’ll see these categories in your logs.

22.3.3.4 Serializers

Serializers handle in what format cassettes are written to files on disk. The current options are YAML (default) and JSON. YAML was implemented first in vcr because that’s the default option in Ruby vcr.

An R6 class Serializer is the parent class for all serializer types; YAML and JSON are both R6 classes that inherit from Serializer. Both YAML and JSON define just two methods: serialize() and deserialize() for converting R structures to yaml or json, and converting yaml or json back to R structures, respectively.

22.3.4 Environments

22.3.4.1 Logging

An internal environment (vcr_log_env) is used when you use logging. At this point it only keeps track of one variable - file - to be able to refer to what file is used for logging across many classes/functions that need to write to the log file.

22.3.4.2 A bit of housekeeping

Another internal environment (vcr__env) is used to keep track of a few items, including the current cassette in use, and the last vcr error.

22.3.4.3 Lightswitch

Another internal environment (light_switch) is used to keep track of users turning on and off vcr. See ?lightswitch.

22.4 Basic usage

22.4.1 In tests

In your tests, for whichever tests you want to use vcr, wrap them in a vcr::use_cassette() call like:

library(testthat)
vcr::use_cassette("rl_citation", {
  test_that("my test", {
    aa <- rl_citation()

    expect_is(aa, "character")
    expect_match(aa, "IUCN")
    expect_match(aa, "www.iucnredlist.org")
  })
})

OR put the vcr::use_cassette() block on the inside, but put testthat expectations outside of the vcr::use_cassette() block:

library(testthat)
test_that("my test", {
  vcr::use_cassette("rl_citation", {
    aa <- rl_citation()
  })

  expect_is(aa, "character")
  expect_match(aa, "IUCN")
  expect_match(aa, "www.iucnredlist.org")
})

Don’t wrap the use_cassette() block inside your test_that() block with testthat expectations inside the use_cassette() block, as you’ll only get the line number that the use_cassette() block starts on on failures.

The first time you run the tests, a “cassette” i.e. a file with recorded HTTP interactions, is created at tests/fixtures/rl_citation.yml. The times after that, the cassette will be used. If you change your code and more HTTP interactions are needed in the code wrapped by vcr::use_cassette("rl_citation", delete tests/fixtures/rl_citation.yml and run the tests again for re-recording the cassette.

22.4.2 Outside of tests

If you want to get a feel for how vcr works, although you don’t need too.

library(vcr)
library(crul)

cli <- crul::HttpClient$new(url = "https://eu.httpbin.org")
system.time(
  use_cassette(name = "helloworld", {
    cli$get("get")
  })
)

The request gets recorded, and all subsequent requests of the same form used the cached HTTP response, and so are much faster

system.time(
  use_cassette(name = "helloworld", {
    cli$get("get")
  })
)

Importantly, your unit test deals with the same inputs and the same outputs - but behind the scenes you use a cached HTTP response - thus, your tests run faster.

The cached response looks something like (condensed for brevity):

http_interactions:
- request:
    method: get
    uri: https://eu.httpbin.org/get
    body:
      encoding: ''
      string: ''
    headers:
      User-Agent: libcurl/7.54.0 r-curl/3.2 crul/0.5.2
  response:
    status:
      status_code: '200'
      message: OK
      explanation: Request fulfilled, document follows
    headers:
      status: HTTP/1.1 200 OK
      connection: keep-alive
    body:
      encoding: UTF-8
      string: "{\n  \"args\": {}, \n  \"headers\": {\n    \"Accept\": \"application/json,
        text/xml, application/xml, */*\", \n    \"Accept-Encoding\": \"gzip, deflate\",
        \n    \"Connection\": \"close\", \n    \"Host\": \"httpbin.org\", \n    \"User-Agent\":
        \"libcurl/7.54.0 r-curl/3.2 crul/0.5.2\"\n  }, \n  \"origin\": \"111.222.333.444\",
        \n  \"url\": \"https://eu.httpbin.org/get\"\n}\n"
  recorded_at: 2018-04-03 22:55:02 GMT
  recorded_with: vcr/0.1.0, webmockr/0.2.4, crul/0.5.2

All components of both the request and response are preserved, so that the HTTP client (in this case crul) can reconstruct its own response just as it would if it wasn’t using vcr.

22.4.3 Less basic usage

For tweaking things to your needs, make sure to read the docs about configuration (e.g., where are the fixtures saved? can they be re-recorded automatically regulary?) and request matching (how does vcr match a request to a recorded interaction?)

All components of both the request and response are preserved, so that the HTTP client (in this case crul) can reconstruct its own response just as it would if it wasn’t using vcr.

22.5 vcr enabled testing

22.5.1 check vs. test

TLDR: Run devtools::test() before running devtools::check() for recording your cassettes.

When running tests or checks of your whole package, note that you’ll get different results with devtools::check() (check button of RStudio build pane) vs. devtools::test() (test button of RStudio build pane). This arises because devtools::check() runs in a temporary directory and files created (vcr cassettes) are only in that temporary directory and thus don’t persist after devtools::check() exits.

However, devtools::test() does not run in a temporary directory, so files created (vcr cassettes) are in whatever directory you’re running it in.

Alternatively, you can run devtools::test_file() (or the “Run test” button in RStudio) to create your vcr cassettes one test file at a time.

22.5.2 CI sites: GitHub Actions, Appveyor, etc.

Refer to the security chapter.