px_micro()
exists to support a specific use case for Statistics Greenland.
They use it to create small PX-files to showcase and present metadata from a lager data set which cannot be made publicly available. See an example on Statistic Greenland’s microdata for Research and Analysis.
px_micro()
Apart from px_save()
, px_micro()
is the
only other function that can save px objects as PX-files.
px_micro()
turns a px object into many smaller PX-files,
each containing a subset of the variables in the original px object.
The basis of micro files are usually a data set which doesn’t have a
count variable (like most PX-files). px_micro()
will
instead create a count of each individual variable.
In this example we will use the built-in data data set
greenlanders
.
library(pxmake)
greenlanders |> dplyr::sample_n(10) |> dplyr::arrange_all()
#> # A tibble: 10 × 4
#> cohort gender age municipality
#> <chr> <chr> <int> <chr>
#> 1 A female 21 Sermersooq
#> 2 A female 59 Qeqertalik
#> 3 A male 24 Kujalleq
#> 4 A male 48 Qeqqata
#> 5 A male 63 Avannaata
#> 6 B female 32 Kujalleq
#> 7 B female 73 Avannaata
#> 8 B male 44 Qeqertalik
#> 9 B male 62 Sermersooq
#> 10 B male 98 Qeqertalik
Create a px object with px()
, and pass it to
px_micro()
.
# Create px object
x <- px(greenlanders)
# Create folder for micro files
micro_dir <- file.path("micro_files")
dir.create(micro_dir)
# Write micro files to folder
px_micro(x, out_dir = micro_dir)
The folder now contains three PX-files, one for each variable except ‘age’.
The reason ‘age’ didn’t get a PX-file is because it is the HEADING
variable in x
, and px_micro()
creates a file
for each non-HEADING variable. Instead the HEADNING variable is used in
all the created PX-files.
# Print HEADING variables
px_heading(x)
#> [1] "age"
# Print non-HEADING variables
c(px_stub(x), px_figures(x))
#> [1] "cohort" "gender" "municipality"
In this case, we want ‘cohort’ to be heading, and to create a PX-file for ‘gender’, ‘age’ and ‘municipality’.
The folder now contains the files we wanted.
Each file contains one of the three variables as STUB, ‘cohort’ as HEADING, and a variable ‘n’ which is the count of each combination of the variables.
px(file.path(micro_dir, 'age.px'))$data
#> # A tibble: 120 × 3
#> age cohort n
#> <chr> <chr> <dbl>
#> 1 18 A NA
#> 2 18 B 1
#> 3 19 A NA
#> 4 19 B 1
#> 5 20 A NA
#> 6 20 B 1
#> 7 21 A 3
#> 8 21 B NA
#> 9 22 A 1
#> 10 22 B NA
#> # ℹ 110 more rows
px(file.path(micro_dir, 'gender.px'))$data
#> # A tibble: 4 × 3
#> gender cohort n
#> <chr> <chr> <dbl>
#> 1 female A 31
#> 2 female B 18
#> 3 male A 23
#> 4 male B 28
px(file.path(micro_dir, 'municipality.px'))$data
#> # A tibble: 10 × 3
#> municipality cohort n
#> <chr> <chr> <dbl>
#> 1 Avannaata A 11
#> 2 Avannaata B 11
#> 3 Kujalleq A 12
#> 4 Kujalleq B 7
#> 5 Qeqertalik A 13
#> 6 Qeqertalik B 5
#> 7 Qeqqata A 9
#> 8 Qeqqata B 11
#> 9 Sermersooq A 9
#> 10 Sermersooq B 12
In general the keyword values from the px object are carried over to the micro files. This is the case for keywords like ‘MATRIX’, ‘SUBJECT-CODE’, ‘CONTACT’, ‘LANGUAGE’, ‘CODEPAGE’, etc.
To change keywords across all the micro files, the easiest is to
change them in the px object before calling px_micro()
.
# Change CONTACT in all micro files
x2 |>
px_contact("Johan Ejstrud") |>
px_micro(out_dir = micro_dir)
However, some keywords need to be changed individually for each micro file. To do so, create a data frame with the column ‘variable’ and a column for each px keyword to change.
individual_keywords <- tibble::tribble(~variable , ~px_description,
"age" , "Age count 18-99",
"gender" , "Gender count",
"municipality", "Municipality 2024"
)
Supply this dataframe to the keyword_values
argument of
px_micro()
.
DESCRIPTION is changed in the micro files:
px(file.path(micro_dir, 'age.px')) %>% px_description()
#> [1] "Age count 18-99"
px(file.path(micro_dir, 'gender.px')) %>% px_description()
#> [1] "Gender count"
px(file.path(micro_dir, 'municipality.px')) %>% px_description()
#> [1] "Municipality 2024"
For multilingual files add a ‘language’ column to
keyword_values
.
x3 <-
x2 |>
px_language("en") |>
px_languages(c("en", "kl"))
individual_keywords_ml <-
tibble::tribble(
~variable, ~language, ~px_description, ~px_matrix,
"age", "en", "Age count 18-99", "AGE",
"age", "kl", "Ukiut 18-99", NA,
"gender", "en", "Gender count", "GEN",
"gender", "kl", " Suiaassuseq", NA,
"municipality", "en", "Municipality 2024", "MUN",
"municipality", "kl", "Kommuni 2024", NA
)
px_micro(x3, out_dir = micro_dir, keyword_values = individual_keywords_ml)
Here ‘px_description’ varies for each language, and ‘px_matrix’ is only set for one of the languages, since it is not a language dependent keywords. For language independant keywords it doesn’t matter which language the value is set for.
The filenames of the micro files are by default the name of the variable, however these can also be changed by passing a ‘filename’ column to ‘keyword_values’
individual_keywords2 <-
individual_keywords |>
dplyr::mutate(filename = paste0(variable, "_2024", ".px"))
# Clear folder
unlink(file.path(micro_dir, "*.px"))
px_micro(x2, out_dir = micro_dir, keyword_values = individual_keywords2)
list.files(micro_dir)
#> [1] "age_2024.px" "gender_2024.px" "municipality_2024.px"