22 Caching HTTP requests
Record HTTP calls and replay them
22.1 Package documentation
Check out https://docs.ropensci.org/vcr/ for documentation on vcr
functions.
22.2 Terminology
- vcr: the name comes from the idea that we want to record something and play it back later, like a vcr
- cassette: A thing to record HTTP interactions to. Right now the only option is the file system (writing to files), but in the future could be other things, e.g. a key-value store like Redis
-
fixture: A fixture is something used to consistently test a piece of software. In this case, a cassette (just defined above) is a fixture - used in unit tests. If you use our setup function
vcr_setup()
the default directory created to hold cassettes is calledfixtures/
as a signal as to what the folder contains. - Persisters: how to save requests - currently only option is the file system
- serialize: translating data into a format that can be stored; here, translate HTTP request and response data into a representation on disk to read back later
- Serializers: how to serialize the HTTP response - currently only option is YAML; other options in the future could include e.g. JSON
- insert cassette: create a cassette (all HTTP interactions will be recorded to this cassette)
- eject cassette: eject the cassette (no longer recording to that cassette)
- replay: refers to using a cached result of an http request that was recorded earlier
22.3 Design
This section explains vcr
’s internal design and architecture.
22.3.1 Where vcr comes from and why R6
vcr
was “ported” from the Ruby gem (aka, library) of the same name6.
Because it was ported from Ruby, an object-oriented programming language
I thought it would be easier to use an object system in R that most
closely resemble that used in Ruby (at least in my opinion). This
thinking lead to choosing R6. The exported functions users interact
with are not R6 classes, but are rather normal R functions. However,
most of the internal code in the package uses R6. Thus, familiarity
with R6 is important for people that may want to contribute to vcr
,
but not required at all for vcr
users.
22.3.2 Principles
22.3.2.1 An easy to use interface hides complexity
As described above, vcr
uses R6 internally, but users interact with
normal R functions. Internal functions that are quite complicated are
largely R6 while exported, simpler functions users interact with are
normal R functions.
22.3.2.2 Class/function names are inherited from Ruby vcr
Since R vcr
was ported from Ruby, we kept most of the names of
functions/classes and variables. So if you’re wondering about why
a function, class, or variable has a particular name, its derivation
can not be found out in this package, for the most part that is.
22.3.2.3 Hooks into HTTP clients
Perhaps the most fundamental thing about that this package work is how it knows what HTTP requests are being made. This stumped me for quite a long time. When looking at Ruby vcr, at first I thought it must be “listening” for HTTP requests somehow. Then I found out about monkey patching; that’s how it’s achieved in Ruby. That is, the Ruby vcr package literally overrides certain methods in Ruby HTTP clients, hijacking internals of the HTTP clients.
However, monkey patching is not allowed in R. Thus, in R we have to
somehow have “hooks” into HTTP clients in R. Fortunately, Scott is the
maintainer of one of the HTTP clients, crul
, so was able to quickly
create a hook. Fortunately, there was already a hook mechanism
in the httr
and httr2
packages.
The actual hooks are not in vcr
, but in webmockr
. vcr
depends on
webmockr
for hooking into HTTP clients httr
, httr2
and crul
.
22.3.3 Internal classes
An overview of some of the more important aspects of vcr.
22.3.3.1 Configuration
An internal object (vcr_c
) is created when vcr
is loaded with
the default vcr configuration options inside of an R6 class VCRConfig
-
see https://github.com/ropensci/vcr/blob/main/R/onLoad.R. This
class is keeps track of default and user specified configuration options.
You can access vcr_c
using triple namespace :::
, though it is not
intended for general use. Whenever you make calls to vcr_configure()
or other configuration functions, vcr_c
is affected.
22.3.3.2 Cassette class
Cassette
is an R6 class that handles internals/state for each cassette.
Each time you run use_cassette()
this class is used. The class has quite
a few methods in it, so there’s a lot going on in the class. Ideally
the class would be separated into subclasses to handle similar sets
of logic, but there’s not an easy way to do that with R6.
Of note in Cassette
is that when called, within the initialize()
call webmockr
is used to create webmockr stubs.
22.3.3.3 How HTTP requests are handled
Within webmockr
, there are calls to the vcr class RequestHandler
, which
has child classes RequestHandlerCrul
, RequestHandlerHttr
and RequestHandlerHttr2
for crul
, httr
and httr2
, respectively. These classes determine what to do with each HTTP request. The options for each HTTP request include:
-
Ignored You can ignore HTTP requests under certain rules using the
configuration options
ignore_hosts
andignore_localhost
-
Stubbed by vcr This is an HTTP request for which a match is found
in the cassette defined in the
use_cassette()
/insert_cassette()
call. In this case the matching request/response from the cassette is returned with no real HTTP request allowed. -
Recordable This is an HTTP request for which no match is found
in the cassette defined in the
use_cassette()
/insert_cassette()
call. In this case a real HTTP request is allowed, and the request/response is recorded to the cassette. - Unhandled This is a group of cases, all of which cause an error to be thrown with a message trying to help the user figure out how to fix the problem.
If you use vcr logging you’ll see these categories in your logs.
22.3.3.4 Serializers
Serializers handle in what format cassettes are written to files on disk.
The current options are YAML (default) and JSON. YAML was implemented first
in vcr
because that’s the default option in Ruby vcr.
An R6 class Serializer
is the parent class for all serializer types;
YAML
and JSON
are both R6 classes that inherit from Serializer
. Both
YAML
and JSON
define just two methods: serialize()
and deserialize()
for converting R structures to yaml or json, and converting yaml or json back
to R structures, respectively.
22.3.4 Environments
22.3.4.1 Logging
An internal environment (vcr_log_env
) is used when you use logging.
At this point it only keeps track of one variable - file
- to be able
to refer to what file is used for logging across many classes/functions
that need to write to the log file.
22.3.4.2 A bit of housekeeping
Another internal environment (vcr__env
) is used to keep track of a
few items, including the current cassette in use, and the last vcr error.
22.3.4.3 Lightswitch
Another internal environment (light_switch
) is used to keep track of users
turning on and off vcr
. See ?lightswitch
.
22.4 Basic usage
22.4.1 In tests
In your tests, for whichever tests you want to use vcr
, wrap them in a vcr::use_cassette()
call like:
library(testthat)
vcr::use_cassette("rl_citation", {
test_that("my test", {
aa <- rl_citation()
expect_is(aa, "character")
expect_match(aa, "IUCN")
expect_match(aa, "www.iucnredlist.org")
})
})
OR put the vcr::use_cassette()
block on the inside, but put testthat
expectations outside of
the vcr::use_cassette()
block:
library(testthat)
test_that("my test", {
vcr::use_cassette("rl_citation", {
aa <- rl_citation()
})
expect_is(aa, "character")
expect_match(aa, "IUCN")
expect_match(aa, "www.iucnredlist.org")
})
Don’t wrap the use_cassette()
block inside your test_that()
block with testthat
expectations inside the use_cassette()
block, as you’ll only get the line number that the use_cassette()
block starts on on failures.
The first time you run the tests, a “cassette” i.e. a file with recorded HTTP interactions, is created at tests/fixtures/rl_citation.yml
.
The times after that, the cassette will be used.
If you change your code and more HTTP interactions are needed in the code wrapped by vcr::use_cassette("rl_citation"
, delete tests/fixtures/rl_citation.yml
and run the tests again for re-recording the cassette.
22.4.2 Outside of tests
If you want to get a feel for how vcr works, although you don’t need too.
library(vcr)
library(crul)
cli <- crul::HttpClient$new(url = "https://eu.httpbin.org")
system.time(
use_cassette(name = "helloworld", {
cli$get("get")
})
)
The request gets recorded, and all subsequent requests of the same form used the cached HTTP response, and so are much faster
system.time(
use_cassette(name = "helloworld", {
cli$get("get")
})
)
Importantly, your unit test deals with the same inputs and the same outputs - but behind the scenes you use a cached HTTP response - thus, your tests run faster.
The cached response looks something like (condensed for brevity):
http_interactions:
- request:
method: get
uri: https://eu.httpbin.org/get
body:
encoding: ''
string: ''
headers:
User-Agent: libcurl/7.54.0 r-curl/3.2 crul/0.5.2
response:
status:
status_code: '200'
message: OK
explanation: Request fulfilled, document follows
headers:
status: HTTP/1.1 200 OK
connection: keep-alive
body:
encoding: UTF-8
string: "{\n \"args\": {}, \n \"headers\": {\n \"Accept\": \"application/json,
text/xml, application/xml, */*\", \n \"Accept-Encoding\": \"gzip, deflate\",
\n \"Connection\": \"close\", \n \"Host\": \"httpbin.org\", \n \"User-Agent\":
\"libcurl/7.54.0 r-curl/3.2 crul/0.5.2\"\n }, \n \"origin\": \"111.222.333.444\",
\n \"url\": \"https://eu.httpbin.org/get\"\n}\n"
recorded_at: 2018-04-03 22:55:02 GMT
recorded_with: vcr/0.1.0, webmockr/0.2.4, crul/0.5.2
All components of both the request and response are preserved, so that the HTTP client (in this case crul
) can reconstruct its own response just as it would if it wasn’t using vcr
.
22.4.3 Less basic usage
For tweaking things to your needs, make sure to read the docs about configuration (e.g., where are the fixtures saved? can they be re-recorded automatically regulary?) and request matching (how does vcr match a request to a recorded interaction?)
All components of both the request and response are preserved, so that the HTTP client (in this case crul
) can reconstruct its own response just as it would if it wasn’t using vcr
.
22.5 vcr enabled testing
22.5.1 check vs. test
TLDR: Run
devtools::test()
before runningdevtools::check()
for recording your cassettes.
When running tests or checks of your whole package, note that you’ll get different results with
devtools::check()
(check button of RStudio build pane) vs. devtools::test()
(test button of RStudio build pane). This arises because devtools::check()
runs in a
temporary directory and files created (vcr cassettes) are only in that temporary directory and
thus don’t persist after devtools::check()
exits.
However, devtools::test()
does not run in a temporary directory, so files created (vcr
cassettes) are in whatever directory you’re running it in.
Alternatively, you can run devtools::test_file()
(or the “Run test” button in RStudio) to create your vcr cassettes one test file at a time.
22.5.2 CI sites: GitHub Actions, Appveyor, etc.
Refer to the security chapter.