Micro files

Use case

px_micro() exists to support a specific use case for Statistics Greenland.

They use it to create small PX-files to showcase and present metadata from a lager data set which cannot be made publicly available. See an example on Statistic Greenland’s microdata for Research and Analysis.

px_micro()

Apart from px_save(), px_micro() is the only other function that can save px objects as PX-files.

px_micro() turns a px object into many smaller PX-files, each containing a subset of the variables in the original px object.

Input data

The basis of micro files are usually a data set which doesn’t have a count variable (like most PX-files). px_micro() will instead create a count of each individual variable.

In this example we will use the built-in data data set greenlanders.

library(pxmake)

greenlanders |> dplyr::sample_n(10) |> dplyr::arrange_all()
#> # A tibble: 10 × 4
#>    cohort gender   age municipality
#>    <chr>  <chr>  <int> <chr>       
#>  1 A      female    21 Sermersooq  
#>  2 A      female    59 Qeqertalik  
#>  3 A      male      24 Kujalleq    
#>  4 A      male      48 Qeqqata     
#>  5 A      male      63 Avannaata   
#>  6 B      female    32 Kujalleq    
#>  7 B      female    73 Avannaata   
#>  8 B      male      44 Qeqertalik  
#>  9 B      male      62 Sermersooq  
#> 10 B      male      98 Qeqertalik

How to create micro files

Create a px object with px(), and pass it to px_micro().

# Create px object
x <- px(greenlanders)

# Create folder for micro files
micro_dir <- file.path("micro_files")
dir.create(micro_dir)

# Write micro files to folder
px_micro(x, out_dir = micro_dir)

The folder now contains three PX-files, one for each variable except ‘age’.

list.files(micro_dir)
#> [1] "cohort.px"       "gender.px"       "municipality.px"

The reason ‘age’ didn’t get a PX-file is because it is the HEADING variable in x, and px_micro() creates a file for each non-HEADING variable. Instead the HEADNING variable is used in all the created PX-files.

# Print HEADING variables
px_heading(x)
#> [1] "age"

# Print non-HEADING variables
c(px_stub(x), px_figures(x))
#> [1] "cohort"       "gender"       "municipality"

In this case, we want ‘cohort’ to be heading, and to create a PX-file for ‘gender’, ‘age’ and ‘municipality’.

x2 <-
  x |>
  px_stub('age') |>    # Change age to STUB
  px_heading('cohort') # Change cohort to HEADING
# Clear folder
unlink(file.path(micro_dir, "*.px"))

px_micro(x2, out_dir = micro_dir)

The folder now contains the files we wanted.

list.files(micro_dir)
#> [1] "age.px"          "gender.px"       "municipality.px"

Each file contains one of the three variables as STUB, ‘cohort’ as HEADING, and a variable ‘n’ which is the count of each combination of the variables.

px(file.path(micro_dir, 'age.px'))$data
#> # A tibble: 120 × 3
#>    age   cohort     n
#>    <chr> <chr>  <dbl>
#>  1 18    A         NA
#>  2 18    B          1
#>  3 19    A         NA
#>  4 19    B          1
#>  5 20    A         NA
#>  6 20    B          1
#>  7 21    A          3
#>  8 21    B         NA
#>  9 22    A          1
#> 10 22    B         NA
#> # ℹ 110 more rows

px(file.path(micro_dir, 'gender.px'))$data
#> # A tibble: 4 × 3
#>   gender cohort     n
#>   <chr>  <chr>  <dbl>
#> 1 female A         31
#> 2 female B         18
#> 3 male   A         23
#> 4 male   B         28

px(file.path(micro_dir, 'municipality.px'))$data
#> # A tibble: 10 × 3
#>    municipality cohort     n
#>    <chr>        <chr>  <dbl>
#>  1 Avannaata    A         11
#>  2 Avannaata    B         11
#>  3 Kujalleq     A         12
#>  4 Kujalleq     B          7
#>  5 Qeqertalik   A         13
#>  6 Qeqertalik   B          5
#>  7 Qeqqata      A          9
#>  8 Qeqqata      B         11
#>  9 Sermersooq   A          9
#> 10 Sermersooq   B         12

Keyword values

In general the keyword values from the px object are carried over to the micro files. This is the case for keywords like ‘MATRIX’, ‘SUBJECT-CODE’, ‘CONTACT’, ‘LANGUAGE’, ‘CODEPAGE’, etc.

To change keywords across all the micro files, the easiest is to change them in the px object before calling px_micro().

# Change CONTACT in all micro files
x2 |>
  px_contact("Johan Ejstrud") |>
  px_micro(out_dir = micro_dir)

However, some keywords need to be changed individually for each micro file. To do so, create a data frame with the column ‘variable’ and a column for each px keyword to change.

individual_keywords <- tibble::tribble(~variable     ,      ~px_description,
                                       "age"         ,    "Age count 18-99",
                                       "gender"      ,       "Gender count",
                                       "municipality",  "Municipality 2024"
                                       )

Supply this dataframe to the keyword_values argument of px_micro().

px_micro(x2, out_dir = micro_dir, keyword_values = individual_keywords)

DESCRIPTION is changed in the micro files:

px(file.path(micro_dir, 'age.px')) %>% px_description()
#> [1] "Age count 18-99"
px(file.path(micro_dir, 'gender.px')) %>% px_description()
#> [1] "Gender count"
px(file.path(micro_dir, 'municipality.px')) %>% px_description()
#> [1] "Municipality 2024"

Multilingual files

For multilingual files add a ‘language’ column to keyword_values.

x3 <-
  x2 |>
  px_language("en") |>
  px_languages(c("en", "kl"))


individual_keywords_ml <- 
  tibble::tribble(
       ~variable, ~language,     ~px_description, ~px_matrix,
           "age",      "en",   "Age count 18-99",      "AGE",
           "age",      "kl",       "Ukiut 18-99",         NA,
        "gender",      "en",      "Gender count",      "GEN",
        "gender",      "kl",      " Suiaassuseq",         NA,
  "municipality",      "en", "Municipality 2024",      "MUN",
  "municipality",      "kl",      "Kommuni 2024",         NA
  )

px_micro(x3, out_dir = micro_dir, keyword_values = individual_keywords_ml)

Here ‘px_description’ varies for each language, and ‘px_matrix’ is only set for one of the languages, since it is not a language dependent keywords. For language independant keywords it doesn’t matter which language the value is set for.

Filenames

The filenames of the micro files are by default the name of the variable, however these can also be changed by passing a ‘filename’ column to ‘keyword_values’

individual_keywords2 <- 
  individual_keywords |>
  dplyr::mutate(filename = paste0(variable, "_2024", ".px"))

# Clear folder
unlink(file.path(micro_dir, "*.px"))

px_micro(x2, out_dir = micro_dir, keyword_values = individual_keywords2)

list.files(micro_dir)
#> [1] "age_2024.px"          "gender_2024.px"       "municipality_2024.px"