Note: To access the BigDataPE APIs, you must be connected to the “PE Integrado” network or use a VPN.
BigDataPE is an R package that provides a secure and intuitive way to access datasets from the BigDataPE platform. The package allows users to fetch data from the API using token-based authentication, manage multiple tokens for different datasets, and retrieve data efficiently using chunking.
You can install the BigDataPE
package directly from
GitHub:
# Install the devtools package if you haven't already
install.packages("devtools")
# Install BigDataPE from GitHub
::install_github("StrategicProjects/bigdatape") devtools
After installation, load the package:
library(BigDataPE)
keyring
package.bdpe_store_token
This function securely stores an authentication token for a specific dataset.
bdpe_store_token(base_name, token)
Parameters:
base_name
: The name of the dataset.token
: The authentication token for the dataset.Example:
bdpe_store_token("education_dataset", "your-token-here")
bdpe_get_token
This function retrieves the securely stored token for a specific dataset.
bdpe_get_token(base_name)
Parameters:
base_name
: The name of the dataset.Example:
<- bdpe_get_token("education_dataset") token
bdpe_remove_token
This function removes the token associated with a specific dataset.
bdpe_remove_token(base_name)
Parameters:
base_name
: The name of the dataset.Example:
bdpe_remove_token("education_dataset")
bdpe_list_tokens
This function lists all datasets with stored tokens.
bdpe_list_tokens()
Example:
<- bdpe_list_tokens()
datasets print(datasets)
bdpe_fetch_data
This function retrieves data from the BigDataPE API using securely stored tokens.
bdpe_fetch_data(
base_name, limit = 100,
offset = 0,
query = list(),
endpoint = "https://www.bigdata.pe.gov.br/api/buscar")
Parameters:
base_name
: The name of the dataset.limit
: Number of records per page. Default is
Inf
offset
: Starting record for the query. Default is
0.query
: Additional query parameters.endpoint
: The API endpoint URL.Example:
<- bdpe_fetch_data("education_dataset", limit = 50) data
bdpe_fetch_chunks
This function retrieves data from the API iteratively in chunks.
bdpe_fetch_chunks(
base_name, total_limit = Inf,
chunk_size = 100,
query = list(),
endpoint = "https://www.bigdata.pe.gov.br/api/buscar")
Parameters:
base_name
: The name of the dataset.total_limit
: Maximum number of records to fetch.
Default is Inf
(fetch all available data).chunk_size
: Number of records per chunk. Default is
50.000query
: Additional query parameters.endpoint
: The API endpoint URL.Example:
# Fetch up to 500 records in chunks of 100
<- bdpe_fetch_chunks(
data "education_dataset",
total_limit = 500,
chunk_size = 100)
# Fetch all available data in chunks of 200
<- bdpe_fetch_chunks(
all_data "education_dataset",
chunk_size = 200)
parse_queries
This internal function constructs a URL with query parameters.
parse_queries(url, query_list)
Parameters:
url
: The base URL.query_list
: A list of query parameters.Example:
<- parse_queries(
url "https://www.example.com",
list(param1 = "value1", param2 = "value2")
)print(url)
Here’s a complete example workflow:
# Store a token for a dataset
bdpe_store_token("education_dataset", "your-token-here")
# Fetch 100 records starting from the first record
<- bdpe_fetch_data("education_dataset", limit = 100, offset = 0)
data
# Fetch data in chunks
<- bdpe_fetch_chunks(
all_data "education_dataset",
total_limit = 500,
chunk_size = 100)
# List all datasets with stored tokens
<- bdpe_list_tokens()
datasets
# Remove a token
bdpe_remove_token("education_dataset")
If you find any issues or have feature requests, feel free to create an issue or a pull request on GitHub.
This package is licensed under the MIT License. See the
LICENSE
file for more details.