nanoparquet

R-CMD-check CRAN status

nanoparquet is a reader and writer for a common subset of Parquet files.

Features:

Limitations:

Installation

Install the R package from CRAN:

install.packages("nanoparquet")

Usage

Read

Call read_parquet() to read a Parquet file:

df <- nanoparquet::read_parquet("example.parquet")

To see the columns of a Parquet file and how their types are mapped to R types by read_parquet(), call parquet_column_types() first:

nanoparquet::parquet_column_types("example.parquet")

Folders of similar-structured Parquet files (e.g. produced by Spark) can be read like this:

df <- data.table::rbindlist(lapply(
  Sys.glob("some-folder/part-*.parquet"),
  nanoparquet::read_parquet
))

Write

Call write_parquet() to write a data frame to a Parquet file:

nanoparquet::write_parquet(mtcars, "mtcars.parquet")

To see how the columns of the data frame will be mapped to Parquet types by write_parquet(), call parquet_column_types() first:

nanoparquet::parquet_column_types(mtcars)

Inspect

Call parquet_info(), parquet_column_types(), parquet_schema() or parquet_metadata() to see various kinds of metadata from a Parquet file:

nanoparquet::parquet_info("mtcars.parquet")
nanoparquet::parquet_column_types("mtcars.parquet")
nanoparquet::parquet_schema("mtcars.parquet")
nanoparquet::parquet_metadata("mtcars.parquet")

If you find a file that should be supported but isn’t, please open an issue here with a link to the file.

Options

See also ?parquet_options().

License

MIT