# Example _targets.R file:
library(targets)
library(tarchetypes)
tar_option_set(
repository = "aws",
resources = tar_resources(
aws = tar_resources_aws(
bucket = "my-test-bucket-25edb4956460647d",
prefix = "my_project_name"
)
)
)
<- function(data) {
write_file saveRDS(data, "file.rds")
"file.rds"
}
list(
tar_target(data, rnorm(5), format = "qs"),
tar_target(file, write_file(data), format = "file")
)
11 Cloud storage
Amazon S3 or Google Cloud Storage are paid services. Amazon and Google not only charge for data, but also for operations that query or modify that data. Read https://aws.amazon.com/s3/pricing/ and https://cloud.google.com/storage/pricing for details.
This chapter requires targets
version 1.3.0 or higher. Please visit the installation instructions.
targets
can store data and metadata on the cloud, either with Amazon Web Service (AWS) Simple Storage Service (S3) or Google Cloud Platform (GCP) Google Cloud Storage (GCS).
11.1 Benefits
11.1.1 Store less data locally
- Use
tar_option_set()
andtar_target()
to opt into cloud storage and configure options. tar_make()
uploads target data to a cloud bucket instead of the local_targets/objects/
folder. Likewise for file targets.1- Every
seconds_meta
seconds,tar_make()
uploads metadata and still keeps local copies in_targets/meta/
folder.2
11.1.2 Inspect the results on a different computer
tar_meta_download()
downloads the latest metadata from the bucket to the local_targets/meta/
folder.3- Helpers like
tar_read()
read local metadata and access target data in the bucket.
11.1.3 Track history
- Turn on versioning in your bucket.
tar_make()
records the versions of the target data in_targets/meta/meta
.- Commit
_targets/meta/meta
to the same version-controlled repository as your R code. - Roll back to a prior commit to roll back the local metadata and give
targets
access to prior versions of the target data.
11.2 Setup
11.2.1 AWS setup
Skip these steps if you already have an AWS account and bucket.
- Sign up for a free tier account at https://aws.amazon.com/free.
- Read the Simple Storage Service (S3) instructions and practice in the web console.
- Install the
paws.storage
R package:install.packages("paws.storage")
. - Follow the
paws
documentation to set your AWS security credentials. - Create an S3 bucket, either in the web console or with
paws.storage::s3()$create_bucket()
.
11.2.2 GCP setup
Skip these steps if you already have an GCP account and bucket.
- Activate a Google Cloud Platform account at https://cloud.google.com.
- Install the
googleCloudStorageR
R package:install.packages("googleCloudStorageR")
. - Follow the
googleCloudStorageR
setup instructions to authenticate into Google Cloud and enable required APIs. - Create a Google Cloud Storage (GCS) bucket either in the web console or
googleCloudStorageR::gcs_create_bucket()
.
11.2.3 Pipeline setup
Use tar_option_set()
to opt into cloud storage and declare options. For AWS:4
repository = "aws"
resources = tar_resources(aws = tar_resources_aws(bucket = "YOUR_BUCKET", prefix = "YOUR/PREFIX"))
Details:
- The process is analogous for GCP.
- The
prefix
is just liketar_config_get("store")
, but for the cloud. It controls where the data objects live in the bucket, and it should not conflict with other projects. - Arguments
repository
,resources
, andcue
oftar_target()
override their counterparts intar_option_set()
. - In
tar_option_set()
,repository
controls the target data, andrepository_meta
controls the metadata. However,repository_meta
just defaults torepository
. To continuously upload the metadata, it usually suffices to set e.g.repository = "aws"
intar_option_set()
.
11.3 Example
Consider a pipeline with two simple targets.
As usual, tar_make()
runs the correct targets in the correct order. Both data files now live in bucket my-test-bucket-25edb4956460647d
at S3 key paths which begin with prefix my_project_name
. Neither _targets/objects/data
nor file.rds
exist locally because repository
is "aws"
.
tar_make()
#> ▶ start target data
#> ● built target data [0 seconds]
#> ▶ start target file
#> ● built target file [0.002 seconds]
#> ▶ end pipeline [1.713 seconds]
At this point, if you switch to a different computer, download your metadata with tar_meta_download()
. Then, your results will be up to date.
tar_make()
#> ✔ skip target data
#> ✔ skip target file
#> ✔ skip pipeline [1.653 seconds]
tar_read()
read local metadata and cloud target data.
tar_read(data)
#> [1] -0.74654607 -0.59593497 -1.57229983 0.40915323 0.02579023
For a file target, tar_read()
downloads the file to its original location and returns the path.
<- tar_read(file)
path
path#> [1] "file.rds"
readRDS(path)
#> [1] -0.74654607 -0.59593497 -1.57229983 0.40915323 0.02579023
For cloud targets,
format = "file_fast"
has no purpose, and it automatically switches toformat = "file"
.↩︎Metadata snapshots are synchronous, so a long target with
deployment = "main"
may block the main R process and delay uploads.↩︎Functions
tar_meta_upload()
,tar_meta_sync()
, andtar_meta_delete()
also manage cloud metadata.↩︎cue = tar_cue(file = FALSE)
is no longer recommended for cloud storage. This unwise shortcut is no longer necessary, as of https://github.com/ropensci/targets/pull/1181 (targets
version >= 1.3.2.9003).↩︎