s3fs
provides a file-system like interface into Amazon
Web Services for R
. It utilizes paws
SDK
and R6
for it’s core
design. This repo has been inspired by Python’s s3fs
, however
it’s API and implementation has been developed to follow
R
’s fs
.
You can install the released version of s3fs from CRAN with:
install.packages('s3fs')
r-universe installation:
# Enable repository from dyfanjones
options(repos = c(
dyfanjones = 'https://dyfanjones.r-universe.dev',
CRAN = 'https://cloud.r-project.org')
)
# Download and install s3fs in R
install.packages('s3fs')
Github installation
::install_github("dyfanjones/s3fs") remotes
paws
:
connection with AWS S3R6
: Setup
core classdata.table
:
wrangle lists into data.framesfs
: file
system on local fileslgr
: set
up loggingfuture
:
set up async functionalityfuture.apply
:
set up parallel loopingfs
s3fs
attempts to give the same interface as
fs
when handling files on AWS S3 from R
.
s3fs
functions are
vectorized, accepting multiple path inputs similar to
fs
.future
object of it’s no-async
counterpart.s3_stream_in
which returns a
list of raw objects.fs
naming conventions with dir_*
,
file_*
and path_*
however with the syntax
s3_
infront i.e s3_dir_*
,
s3_file_*
and s3_path_*
etc.fs
if a
failure happens, then it will be raised and not masked with a
warning.s3fs
functions are
designed to have the option to run in parallel through the use of
future
and future.apply
.For example: copy a large file from one location to the next.
library(s3fs)
library(future)
plan("multisession")
s3_file_copy("s3://mybucket/multipart/large_file.csv", "s3://mybucket/new_location/large_file.csv")
s3fs
to copy a large file (> 5GB) using multiparts,
future
allows each multipart to run in parallel to speed up
the process.
s3fs
uses future
to create a few key async functions. This is more focused on functions
that might be moving large files to and from R
and
AWS S3
.For example: Copying a large file from AWS S3
to
R
.
library(s3fs)
library(future)
plan("multisession")
s3_file_copy_async("s3://mybucket/multipart/large_file.csv", "large_file.csv")
fs
has a straight forward API with 4 core themes:
path_
for manipulating and constructing pathsfile_
for filesdir_
for directorieslink_
for linkss3fs
follows theses themes with the following:
s3_path_
for manipulating and constructing s3 uri
pathss3_file_
for s3 filess3_dir_
for s3 directoriesNOTE: link_
is currently not
supported.
library(s3fs)
# Construct a path to a file with `path()`
s3_path("foo", "bar", letters[1:3], ext = "txt")
#> [1] "s3://foo/bar/a.txt" "s3://foo/bar/b.txt" "s3://foo/bar/c.txt"
# list buckets
s3_dir_ls()
#> [1] "s3://MyBucket1"
#> [2] "s3://MyBucket2"
#> [3] "s3://MyBucket3"
#> [4] "s3://MyBucket4"
#> [5] "s3://MyBucket5"
# list files in bucket
s3_dir_ls("s3://MyBucket5")
#> [1] "s3://MyBucket5/iris.json" "s3://MyBucket5/athena-query/"
#> [3] "s3://MyBucket5/data/" "s3://MyBucket5/default/"
#> [5] "s3://MyBucket5/iris/" "s3://MyBucket5/made-up/"
#> [7] "s3://MyBucket5/test_df/"
# create a new directory
<- s3_dir_create(s3_file_temp(tmp_dir = "MyBucket5"))
tmp
tmp#> [1] "s3://MyBucket5/filezwkcxx9q5562"
# create new files in that directory
s3_file_create(s3_path(tmp, "my-file.txt"))
#> [1] "s3://MyBucket5/filezwkcxx9q5562/my-file.txt"
s3_dir_ls(tmp)
#> [1] "s3://MyBucket5/filezwkcxx9q5562/my-file.txt"
# remove files from the directory
s3_file_delete(s3_path(tmp, "my-file.txt"))
s3_dir_ls(tmp)
#> character(0)
# remove the directory
s3_dir_delete(tmp)
Created on 2022-06-21 by the reprex package (v2.0.1)
Similar to fs
, s3fs
is designed to work
well with the pipe.
library(s3fs)
<- s3_file_temp(tmp_dir = "MyBucket") |>
paths s3_dir_create() |>
s3_path(letters[1:5]) |>
s3_file_create()
paths#> [1] "s3://MyBucket/fileazqpwujaydqg/a"
#> [2] "s3://MyBucket/fileazqpwujaydqg/b"
#> [3] "s3://MyBucket/fileazqpwujaydqg/c"
#> [4] "s3://MyBucket/fileazqpwujaydqg/d"
#> [5] "s3://MyBucket/fileazqpwujaydqg/e"
|> s3_file_delete()
paths #> [1] "s3://MyBucket/fileazqpwujaydqg/a"
#> [2] "s3://MyBucket/fileazqpwujaydqg/b"
#> [3] "s3://MyBucket/fileazqpwujaydqg/c"
#> [4] "s3://MyBucket/fileazqpwujaydqg/d"
#> [5] "s3://MyBucket/fileazqpwujaydqg/e"
Created on 2022-06-22 by the reprex package (v2.0.1)
NOTE: all examples have be developed from
fs
.
s3fs
allows you to connect to file systems that provides
an S3-compatible interface. For example, MinIO offers high-performance, S3 compatible
object storage. You will be able to connect to your MinIO
server using s3fs::s3_file_system
:
library(s3fs)
s3_file_system(
aws_access_key_id = "minioadmin",
aws_secret_access_key = "minioadmin",
endpoint = "http://localhost:9000"
)
s3_dir_ls()
#> [1] ""
s3_bucket_create("s3://testbucket")
#> [1] "s3://testbucket"
# refresh cache
s3_dir_ls(refresh = T)
#> [1] "s3://testbucket"
s3_bucket_delete("s3://testbucket")
#> [1] "s3://testbucket"
# refresh cache
s3_dir_ls(refresh = T)
#> [1] ""
Created on 2022-12-14 with reprex v2.0.2
NOTE: if you to want change from AWS S3 to Minio in
the same R session, you will need to set the parameter
refresh = TRUE
when calling s3_file_system
again. You can use multiple sessions by using the R6 class
S3FileSystem
directly.
Please open a Github ticket raising any issues or feature requests.