From CRAN:
install.packages("danstat")
From Github:
# install.packages("devtools")
::install_github("ValeriVoev/danstat") devtools
The danstat
package provides an R interface to Danmarks
Statistik Statistikbank API to enable an easier access to the wealth of
data in the data bank for research and the general community. The
documentation of the API can be found here: Databank
API.
The API has 4 endpoints which are mimicked by four main functions of the package:
get_subjects()
(SUBJECTS endpoint) retrieves
information about subjects around which the data tables in the data bank
are organized. The subjects are arranged hierarchically highest level
like “Labour and income”, “Transport”, etc. get_subjects()
retrieves the highest level of the hierarchy. See the function
documentation for more details.get_tables()
(TABLES endpoint) retrieves a list of
tables associated with a given subject code. For example
get_tables(subjects = "2")
retrieves all tables related to
the subject “Labour and income” with table id, description, variables in
the table, etc.get_table_metadata()
(TABLEINFO endpoint) returns
information about a particular table - description, time of last update,
whether or not it is actively updated, and most importantly (for
practical purposes) the variable names and id’s which are needed
whenever you request actual data from the table. Set
variables_only = TRUE
if you only want to get information
on the table variables.get_data()
(DATA endpoint) - returns data from a
selected table. It is required to include a variables
argument as a list. Each element of the list should itself be a named
list (with elements code
and values
) where
code
is the variable id for which data is requested, and
values
is a vector of values for this variable. If all
values are requested, specify values = NA
. For
example:library(danstat)
= list(list(code = "ieland", values = c(5100, 5128)),
user_input list(code = "køn", values = c(1,2)),
list(code = "tid", values = NA))
get_data(table_id = "folk1c", variables = user_input)
#> # A tibble: 192 x 4
#> IELAND KØN TID INDHOLD
#> <chr> <chr> <chr> <dbl>
#> 1 Denmark Men 2008Q1 2465810
#> 2 Denmark Men 2008Q2 2466036
#> 3 Denmark Men 2008Q3 2467712
#> 4 Denmark Men 2008Q4 2469977
#> 5 Denmark Men 2009Q1 2470457
#> # … with 187 more rows
Note that while default language is set to English and variable
values are indeed returned in English, e.g. “Men”, column names are
returned in Danish, e.g. “KØN”, “INDHOLD”, etc. Unfortunately, the API
doesn’t currently provide an option to return column names (variable
names) in English. However, you can get the English translation using
get_table_metadata
. For example, for the above table
library(dplyr)
get_table_metadata(table_id = "folk1c", variables_only = TRUE) %>%
select(id, text)
#> id text
#> 1 OMRÅDE region
#> 2 KØN sex
#> 3 ALDER age
#> 4 HERKOMST ancestry
#> 5 IELAND country of origin
#> 6 Tid time
we can see that “Område” translates to “region”, “Køn” to “sex”,
“Alder” to “age”, etc. “Indhold” is always the “value” column whenever
data is returned with the get_data
function.
There are (as far as I know) two other packages with similar functionality:
In the packages above, the API is called with a GET
request, while POST
is the prefrerred option of the API
developers and is also what is used in this package. Also, I think that
using POST
requests makes the package code more readable
compared to the long url-encoded queries needed for GET
requests. Also, as of this moment, the rOpenGov package seems to not
have been maintained for the past 3 years. In any case, users can
consider the above 2 packages as alternatives to this one.