Content
How do I format a data frame?
Q: I have a data frame with different types of variables: numbers, dates, character values. How can I format them without messing up the original data?
A: With the fmtr package you can
assign the formats to the “format” attribute of the dataframe, and the
apply the formats using the fdata()
function, returning the
result to a new data frame. Here is a simple example:
# Create sample data
dat <- data.frame(SUBJ = c(1, 2, 3),
BDATE = c(as.Date("1945-10-17"),
as.Date("1967-09-04"),
as.Date("1998-04-28")),
SEX = c("M", "F", "M"),
WEIGHT = c(77.1107, 64.2848, 85.9842))
# View data
dat
# SUBJ BDATE SEX WEIGHT
# 1 1 1945-10-17 M 77.1107
# 2 2 1967-09-04 F 64.2848
# 3 3 1998-04-28 M 85.9842
# Assign formats
formats(dat) <- list(BDATE = "%Y/%m/%d",
SEX = c("M" = "Male", "F" = "Female"),
WEIGHT = "%1.1f kg")
# Apply formats to new data frame
dat2 <- fdata(dat)
# View new data frame
dat2
# SUBJ BDATE SEX WEIGHT
# 1 1 1945/10/17 Male 77.1 kg
# 2 2 1967/09/04 Female 64.3 kg
# 3 3 1998/04/28 Male 86.0 kg
How can I add a new, formatted column to my dataframe?
Q: I want to add a new column to my data frame using a format. This column will be a categorization of an existing continuous column. I want to add it directly to my existing data frame, not create a new one. How can I do that?
A: First create a categorization format:
Then we’ll create some sample data:
Then you can apply the format to your data frame using either Base R or tidyverse.
# Base R method
dat$CAT <- fapply(dat$NUM, fmt)
# View result
dat
# ID NUM CAT
# 1 1 2 A
# 2 2 3 A
# 3 3 7 B
# tidyverse method
dat <- dat %>%
mutate(CAT = fapply(NUM, fmt))
dat
# ID NUM CAT
# 1 1 2 A
# 2 2 3 A
# 3 3 7 B
How do I use a format as a lookup?
Q: I have a decode lookup I want to use on a variable in my dataframe. How can I do this with the fmtr package?
A: There are two common ways to apply a lookup decode with the fmtr package. One is to create a named vector from the decode. The other is to create a user-defined format.
# Create sample data frame
dat <- data.frame(ID = c(1, 2, 3, 4),
CODE = c("A", "C", "B", NA))
# Create decode vector
v1 <- c(A = "Value A", B = "Value B", C = "Value C")
# Create user-defined format
fmt1 <- value(condition(x == "A", "Value A"),
condition(x == "B", "Value B"),
condition(x == "C", "Value C"),
condition(TRUE, "Other"))
# Apply decode vector
fapply(dat$CODE, v1)
# [1] "Value A" "Value C" "Value B" NA
# Apply user-defined format
fapply(dat$CODE, fmt1)
# [1] "Value A" "Value C" "Value B" "Other"
As you can see, both the named vector and the user-defined format can decode the data. The advantage of the user-defined format is that it allows you to handle NA values and assign defaults in a controlled way.
The named vector is easy to create. But there is no way to control what happens to any data value that is not in the lookup. Which method to use depends on your data and the context in which you are applying the decode.
Note that you may also write a vectorized function to perform the
lookup. See “Example 4” in the documentation on fapply()
for a vectorized function example.
Where is the documentation for the formatting codes?
Q: I’m trying to create some formats for dates and numbers, but am not sure what codes are available. Where is the documentation for the possible codes?
A: Some commonly used codes are documented as part of the fmtr documentation here. Additional documentation on the possible codes for dates is here, and for numbers here.
How do I assign formats from metadata to my data frame?
Q: I have metadata for my datasets that includes the desired format for each variable. How can I apply these formats to my data?
A: The metadata must ultimately map the variable name to the desired format for that variable. So we can imagine there are at least two columns in the metadata: the variable name and the format.
# Sample metadata
mdat <- data.frame(var = c("col1", "col2", "col3"),
fmt = c("%1.1f", "%m-%d-%Y", "%1.2f%%"))
# View metadata
mdat
# var fmt
# 1 col1 %1.1f
# 2 col2 %m-%d-%Y
# 3 col3 %1.2f%%
Then imagine another data frame that needs formatting:
# Sample data
dat <- data.frame(col1 = c(1.235, 3.3947, 7.2842),
col2 = c(as.Date("2021-11-01"),
as.Date("2021-11-02"),
as.Date("2021-11-03")),
col3 = c(23.325, 87.2746, 64.2184))
# View sample data
dat
# col1 col2 col3
# 1 1.2350 2021-11-01 23.3250
# 2 3.3947 2021-11-02 87.2746
# 3 7.2842 2021-11-03 64.2184
Now we can put the format metadata into a list:
# Create list out of metadata vectors
lst <- as.list(mdat$fmt)
names(lst) <- mdat$var
# View List
lst
# $col1
# [1] "%1.1f"
#
# $col2
# [1] "%m-%d-%Y"
#
# $col3
# [1] "%1.2f%%"
Now we can assign the list of formats to the dataframe format
attributes using the formats()
function:
# Assign formats to data
formats(dat) <- lst
# Data not formatted yet
dat
# col1 col2 col3
# 1 1.2350 2021-11-01 23.3250
# 2 3.3947 2021-11-02 87.2746
# 3 7.2842 2021-11-03 64.2184
Then apply the formats using fdata()
:
# Apply the formats to entire data frame
fdata(dat)
# col1 col2 col3
# 1 1.2 11-01-2021 23.32%
# 2 3.4 11-02-2021 87.27%
# 3 7.3 11-03-2021 64.22%
How do I assign labels from metadata to my data frame?
Q: I have metadata for my datasets that includes the desired label for each variable. How can I apply these labels to my data?
A: This question is similar to the above question
concerning formats in metadata. The process for applying labels from
metadata will be nearly identical to the process for formats. You will
create a named list of labels from the metadata, then assign it to the
dataframe, and apply it with fdata()
. The difference is
that in the case of labels, you will assign them with the
labels()
function instead of the formats()
function. Like this:
# Create sample list of labels
lst <- list(col1 = "My First Column",
col2 = "My Second Column",
col3 = "My Third Column")
# Create sample data frame
dat <- data.frame(col1 = c(1.235, 3.3947, 7.2842),
col2 = c(as.Date("2021-11-01"),
as.Date("2021-11-02"),
as.Date("2021-11-03")),
col3 = c(23.325, 87.2746, 64.2184))
# Assign labels to data frame
labels(dat) <- lst
# View label attributes
labels(dat)
# $col1
# [1] "My First Column"
#
# $col2
# [1] "My Second Column"
#
# $col3
# [1] "My Third Column"
Note that starting in fmtr 1.5.8, the
labels()
function has been moved to the
common package. To use the labels()
function, please reference that package.
How do I assign descriptions from metadata to my data frame?
Q: I have metadata for my datasets that includes the desired descriptions for each variable. How can I apply these descriptions to my data?
A: This question is similar to the questions on
applying formats from metadata, and applying labels from metadata. The
only difference is that you will use the descriptions()
function.
# Assign descriptions
descriptions(dat) <- list(col1 = "Here is my description for col1.",
col2 = "Here is my description for col2.",
col3 = "Here is my description for col3.")
# View descriptions
descriptions(dat)
# $col1
# [1] "Here is my description for col1."
#
# $col2
# [1] "Here is my description for col2."
#
# $col3
# [1] "Here is my description for col3."
How do I create a format catalog?
Q: I want to create a format catalog, save that catalog, and reuse it later. How can I do that?
A: The fmtr package provides all the functions necessary to do what you want. Here is an example:
library(fmtr)
# Create format catalog
fmts <- fcat(AGECAT = value(condition(x >= 18 & x <= 24, "18 to 24"),
condition(x >= 25 & x <= 44, "25 to 44"),
condition(x >= 45 & x <= 64, "45 to 64"),
condition(x >= 65, ">= 65"),
condition(TRUE, "Other")),
SEX = value(condition(is.na(x), "Missing"),
condition(x == "M", "Male"),
condition(x == "F", "Female"),
condition(TRUE, "Other")),
VAR = c("AGE" = "Age",
"AGECAT" = "Age Group",
"SEX" = "Sex"))
# Save format catalog
write.fcat(fmts, "c:/mypath")
# Read format catalog back in
fmts <- read.fcat("c:/mypath/fmts.fcat")
# View format catalog
fmts
# # A format catalog: 3 formats
# - $AGECAT: type U, 5 conditions
# - $SEX: type U, 4 conditions
# - $VAR: type V, 3 elements
# Use restored formats
fapply(c(55, 27, 19), fmts$AGECAT)
# [1] "45 to 64" "25 to 44" "18 to 24"
How do I use formats from a format catalog?
Q: My colleague gave me a format catalog. How can I use it?
A: First read the format catalog into R using
read.fcat()
. Then you can begin using the formats in the
catalog using dollar sign ($) notation. Here is an example:
# Read format catalog back in
fmts <- read.fcat("c:/mypath/fmts.fcat")
# View format catalog
fmts
# # A format catalog: 3 formats
# - $AGECAT: type U, 5 conditions
# - $SEX: type U, 4 conditions
# - $VAR: type V, 3 elements
# Use restored formats
fapply(c(55, 27, 19), fmts$AGECAT)
# [1] "45 to 64" "25 to 44" "18 to 24"
How do I assign formats from a catalog to a data frame?
Q: I have a format catalog that I use to store formats. The formats are shared between several datasets. How can I assign formats from the catalog to one of my datasets?
A: Read the format catalog in using
read.fcat()
, then assign the formats using the
formats()
function. Like this:
# Read format catalog in
fmts <- read.fcat("c:/mypath/fmts.fcat")
# View format catalog
fmts
# # A format catalog: 3 formats
# - $AGECAT: type U, 5 conditions
# - $SEX: type U, 4 conditions
# - $VAR: type V, 3 elements
# Create sample data frame
dat <- read.table(header = TRUE, text = '
SUBJECT AGECAT SEX
101 35 F
102 19 F
103 57 M
')
# Assign formats from catalog to data frame
formats(dat) <- fmts
# View formatted data
fdata(dat)
# SUBJECT AGECAT SEX
# 1 101 25 to 44 Female
# 2 102 18 to 24 Female
# 3 103 45 to 64 Male
Note that this only works when the format names in the catalog
correspond to the column names in the dataframe. If the names in the
catalog do not correspond to the column names, it is best to manipulate
the names of the format catalog using the names()
function
so that they match the column names in the dataframe. Then proceed as
above. Like this:
# Read format catalog in
fmts <- read.fcat("c:/packages/fmts.fcat")
# View format catalog
fmts
# # A format catalog: 3 formats
# - $AGECAT: type U, 5 conditions
# - $SEX: type U, 4 conditions
# - $VAR: type V, 3 elements
# Create sample data frame
dat <- read.table(header = TRUE, text = '
SUBJ AGE GENDER
101 35 F
102 19 F
103 57 M
')
# Reassign format names in catalog
names(fmts) <- c("AGE", "GENDER", "VAR")
# Assign formats from catalog to data frame
formats(dat) <- fmts
# View formatted data
fdata(dat)
# SUBJECT AGECAT SEX
# 1 101 25 to 44 Female
# 2 102 18 to 24 Female
# 3 103 45 to 64 Male
How do I create a format catalog from an Excel spreadsheet?
Q: I have format information stored in an Excel spreadsheet. Can I use that to create a format catalog and format my data?
A: Yes, provided the data is either in the correct
structure or can be put in the correct structure to create a format
catalog. The correct structure includes the following columns: Name,
Type, Expression, Label and Order. See the documentation on
as.fcat.data.frame()
for further description of the needed
column values.
Here is an example showing Excel data that is already in the correct structure:
library(fmtr)
library(readxl)
# Read data from Excel
xldat <- read_excel("c:\\packages\\myxlfile.xlsx")
# View data frame
xldat
# # A tibble: 10 x 5
# Name Type Expression Label Order
# <chr> <chr> <chr> <chr> <lgl>
# 1 AGECAT U "x >= 18 & x <= 24" 18 to 24 NA
# 2 AGECAT U "x >= 25 & x <= 44" 25 to 44 NA
# 3 AGECAT U "x >= 45 & x <= 64" 45 to 64 NA
# 4 AGECAT U "x >= 65" >= 65 NA
# 5 AGECAT U "TRUE" Other NA
# 6 SEX U "is.na(x)" Missing NA
# 7 SEX U "x == \"M\"" Male NA
# 8 SEX U "x == \"F\"" Female NA
# 9 SEX U "TRUE" Other NA
# 10 VAR V "c(AGE = \"Age\", AGECAT = \"Age Group\", SEX = \"Sex\")" NA NA
# Convert dataframe to format catalog
fmts <- as.fcat(xldat)
# View format catalog
fmts
# # A format catalog: 3 formats
# - $AGECAT: type U, 5 conditions
# - $SEX: type U, 4 conditions
# - $VAR: type V, 3 element
# Create sample data frame
dat <- read.table(header = TRUE, text = '
SUBJECT AGECAT SEX
101 35 F
102 19 F
103 57 M
')
# Assign formats from catalog
formats(dat) <- fmts
# Apply formats
fdata(dat)
# SUBJECT AGECAT SEX
# 1 101 25 to 44 Female
# 2 102 18 to 24 Female
# 3 103 45 to 64 Male
How do I export my format catalog to a spreadsheet?
Q: I have a format catalog I created in R with the fmtr package. I want to store that catalog in a spreadsheet for documentation purposes. How can I do that?
A: There is a Base R function
as.data.frame()
that can be used to convert a
fmtr user-defined format or a format catalog to a data
frame. From there, it is easy to export to Excel or any other file
format you like. Here is an example:
library(fmtr)
library(openxlsx)
# Create sample format catlog
fmts <- fcat(AGECAT = value(condition(x >= 18 & x <= 24, "18 to 24"),
condition(x >= 25 & x <= 44, "25 to 44"),
condition(x >= 45 & x <= 64, "45 to 64"),
condition(x >= 65, ">= 65"),
condition(TRUE, "Other")),
SEX = value(condition(is.na(x), "Missing"),
condition(x == "M", "Male"),
condition(x == "F", "Female"),
condition(TRUE, "Other")),
VAR = c("AGE" = "Age",
"AGECAT" = "Age Group",
"SEX" = "Sex"))
# View format catalog
fmts
# # A format catalog: 3 formats
# - $AGECAT: type U, 5 conditions
# - $SEX: type U, 4 conditions
# - $VAR: type V, 3 element
# Convert format catalog to data frame
dat <- as.data.frame(fmts)
# Write data frame to Excel using openxlsx
write.xlsx(dat, "c:\\mypath\\myxlfile.xlsx")
Can I create a user-defined format from data?
Q: I have a dataset with a code list that I want to create a user-defined format from. Is there a way to do that?
A: Yes. There is a function as.fmt()
that allows you to convert a data frame into a user-defined format. But
the input dataframe needs a specific structure. Here is an example:
library(fmtr)
# Create sample input data
dat <- read.table(header = TRUE, text ='
Col1 Col2
A "Label A"
B "Label B"
C "Label C"')
# Create main conditions
df1 <- data.frame(Name = "myfmt",
Type = "U",
Expression = paste0("x == '", dat$Col1, "'"),
Label = dat$Col2,
Order = NA)
# Create default condition
df2 <- data.frame(Name = "myfmt",
Type = "U",
Expression = "TRUE",
Label = "Other",
Order = NA)
# Append default condition
df <- rbind(df1, df2)
# View input data
df
# Name Type Expression Label Order
# 1 myfmt U x == 'A' Label A NA
# 2 myfmt U x == 'B' Label B NA
# 3 myfmt U x == 'C' Label C NA
# 4 myfmt U TRUE Other NA
# Convert data frame to user-defined format
fmt <- as.fmt(df)
# Apply the format
fapply(c("A", "B", "C", NA), fmt)
# [1] "Label A" "Label B" "Label C" "Other"
Can I create an input format?
Q: SAS® distinguishes between an input format and an output format. Is there a similar distinction in the fmtr package?
A: No. All fmtr formats are output formats.
Can I create a numeric user-defined format?
Q: I have a column of data with some character values that I want to convert to a number. I’d like to create a format to do that. Can I create a user-defined format that returns a number instead of a text string?
A: Yes. The second parameter of the
condition()
function accepts a character, numeric, or
logical value. That means a fmtr user-defined format
can be used to translate incoming values, whether character or numeric,
to a number. Here is an example:
library(fmtr)
nfmt <- value(condition(x == "A", 1),
condition(x == "B", 2),
condition(TRUE, 3))
fapply(c("A", "B", "C"), nfmt)
# [1] 1 2 3
Is there a way to set up a search path for a format catalog?
Q: In SAS® you can set up a search path for a format catalog, so you don’t need to read it. You can just reference the format names and it will work. Is there a similar functionality in the fmtr package?
A: Not exactly. What you can do is use the
file.find()
function from the common package to
search for the format catalog file, and then read the catalog into your
program using read.fcat()
. From there you just use the
format catalog as normal.
Is there a way to read in a format catalog from SAS®?
Q: I have an existing format catalog from SAS® that I want to convert to R. Does the fmtr package provide a way to read in a SAS® format catalog?
A: Not directly. What you can do is export the SAS
format catalog to a dataset, read the dataset into R, and rearrange the
data to correspond to the requirements of
as.fcat.data.frame()
. Then use
as.fcat.data.frame()
to convert the data frame to a
fmtr format catalog. Note that SAS® provides a lot of
functionality in their formats that cannot be reproduced in
fmtr. So there is no guarantee that all your SAS
formats will convert as desired.
What happened to the labels() function?
Q: I had been using the labels()
function in a few programs, and now they are all broken. It appears the
labels()
function is no longer part of the
fmtr package. What happened?
A: The labels()
function has been moved
to the common package. The reason it was moved is
because this function is so generally useful, that the
common package was deemed a more appropriate home. You
can fix your code by simply adding a reference to the
common package.
Can I change a User-Defined Format after it has been created?
Q: I created a User_Define Format, and now I want to add a new item to it. Is there a way to do that?
A: Yes. Reminder that, in the end, a User-Defined Format is a list, and can be manipulated as such. You can change a User-Defined Format the same way you change a list. For example:
library(fmtr)
# Create format
fmt <- value(condition(x == "A", "Group A"),
condition(x == "B", "Group B"))
# Create sample data
dat <- c("A", "B", "C")
# Apply format
fapply(dat, fmt)
# [1] "Group A" "Group B" "C"
# Add "C" condition to format
fmt[[length(fmt) + 1]] <- condition(x == "C", "Group C")
# Apply revised format
fapply(dat, fmt)
# [1] "Group A" "Group B" "Group C"