The purpose of the first table in a medical paper is most often to describe your population. In an RCT the table frequently compares the baseline characteristics between the randomized groups, while an observational study will often compare exposed with unexposed. In this vignette I will show how I use the functions to quickly generate a descriptive table.
We will use the mtcars dataset and compare the groups with automatic transmission to those without. The units and labels are built upon the logic in the Hmisc package that allow us to specify attributes on columns. Note that this labeling is not needed, it just makes stuff nicer.
library(Gmisc)
data("mtcars")
mtcars <- mtcars %>%
mutate(am = factor(am, levels = 0:1, labels = c("Automatic", "Manual")),
gear = factor(gear),
# Make up some data for making it slightly more interesting
col = factor(sample(c("red", "black", "silver"),
size = NROW(mtcars),
replace = TRUE))) %>%
set_column_labels(mpg = "Gas",
wt = "Weight",
am = "Transmission",
gear = "Gears",
col = "Car color") %>%
set_column_units(mpg = "Miles/(US) gallon",
wt = "10<sup>3</sup> lbs")
getDescriptionStatsBy
The function getDescriptionStatsBy
is a simple way to do
basic descriptive statistics. Mandatory named column is by:
Automatic | Manual | |
---|---|---|
Gas | 17.1 (±3.8) | 24.4 (±6.2) |
Weight | 3.8 (±0.8) | 2.4 (±0.6) |
Transmission | 19 (100.0%) | 0 (0.0%) |
Gears | ||
3 | 15 (78.9%) | 0 (0.0%) |
4 | 4 (21.1%) | 8 (61.5%) |
5 | 0 (0.0%) | 5 (38.5%) |
Car color | ||
black | 8 (42.1%) | 3 (23.1%) |
red | 5 (26.3%) | 4 (30.8%) |
silver | 6 (31.6%) | 6 (46.2%) |
If we prefer median we can simply specify the statistic used with continuous variables:
Automatic | Manual | |
---|---|---|
Gas | 17.3 (14.9 - 19.2) | 22.8 (21.0 - 30.4) |
Weight | 3.5 (3.4 - 3.8) | 2.3 (1.9 - 2.8) |
Transmission | 19 (100.0%) | 0 (0.0%) |
Gears | ||
3 | 15 (78.9%) | 0 (0.0%) |
4 | 4 (21.1%) | 8 (61.5%) |
5 | 0 (0.0%) | 5 (38.5%) |
Car color | ||
black | 8 (42.1%) | 3 (23.1%) |
red | 5 (26.3%) | 4 (30.8%) |
silver | 6 (31.6%) | 6 (46.2%) |
htmlTable
Key to having a good descriptive statistics is to be able to output
it into a table. I usually rely on htmlTabl
for all my
table requirements as it has a nice set of advanced options that allow
us to get publication ready tables that can simply be copy-pasted into
our paper. Note that we here use html code † that we then
explain in the footer. If we specify a name to the parameters like this
we override the labels previously set.
mtcars %>%
getDescriptionStatsBy(mpg,
`Weight†` = wt,
am,
gear,
col,
by = am) %>%
htmlTable(caption = "Basic descriptive statistics from the mtcars dataset",
tfoot = "† The weight is in 10<sup>3</sup> kg")
Basic descriptive statistics from the mtcars dataset | ||
Automatic | Manual | |
---|---|---|
Gas | 17.1 (±3.8) | 24.4 (±6.2) |
Weight† | 3.8 (±0.8) | 2.4 (±0.6) |
Transmission | 19 (100.0%) | 0 (0.0%) |
Gears | ||
3 | 15 (78.9%) | 0 (0.0%) |
4 | 4 (21.1%) | 8 (61.5%) |
5 | 0 (0.0%) | 5 (38.5%) |
Car color | ||
black | 8 (42.1%) | 3 (23.1%) |
red | 5 (26.3%) | 4 (30.8%) |
silver | 6 (31.6%) | 6 (46.2%) |
† The weight is in 103 kg |
There is a large set of options for
getDescriptionStatsBy
, here is an example with some of them
an some extra styling.
mtcars %>%
getDescriptionStatsBy(mpg,
`Weight†` = wt,
am,
gear,
col,
by = am,
digits = 0,
add_total_col = TRUE,
use_units = "name") %>%
addHtmlTableStyle(pos.caption = "bottom") %>%
htmlTable(caption = "Basic descriptive statistics from the mtcars dataset",
tfoot = "† The weight is in 10<sup>3</sup> kg")
Total | Automatic | Manual | |
---|---|---|---|
Gas (Miles/(US) gallon) | 20 (±6) | 17 (±4) | 24 (±6) |
Weight† (103 lbs) | 3 (±1) | 4 (±1) | 2 (±1) |
Transmission | 19 (59%) | 19 (100%) | 0 (0%) |
Gears | |||
3 | 15 (47%) | 15 (79%) | 0 (0%) |
4 | 12 (38%) | 4 (21%) | 8 (62%) |
5 | 5 (16%) | 0 (0%) | 5 (38%) |
Car color | |||
black | 11 (34%) | 8 (42%) | 3 (23%) |
red | 9 (28%) | 5 (26%) | 4 (31%) |
silver | 12 (38%) | 6 (32%) | 6 (46%) |
Basic descriptive statistics from the mtcars dataset | |||
† The weight is in 103 kg |
Event though p-values are discouraged in the Table 1, they are not uncommon. I have therefore added basic statistics consisting that defaults to Fisher’s exact test for proportions and Wilcoxon rank sum test for continuous values.
mtcars %>%
getDescriptionStatsBy(mpg,
wt,
am,
gear,
col,
by = am,
continuous_fn = describeMedian,
digits = 0,
header_count = TRUE,
statistics = TRUE) %>%
htmlTable(caption = "Basic descriptive statistics from the mtcars dataset")
Basic descriptive statistics from the mtcars dataset | |||
Automatic No. 19 |
Manual No. 13 |
P-value | |
---|---|---|---|
Gas | 17 (15 - 19) | 23 (21 - 30) | 0.002 |
Weight | 4 (3 - 4) | 2 (2 - 3) | < 0.0001 |
Transmission | 19 (100%) | 0 (0%) | < 0.0001 |
Gears | < 0.0001 | ||
3 | 15 (79%) | 0 (0%) | |
4 | 4 (21%) | 8 (62%) | |
5 | 0 (0%) | 5 (38%) | |
Car color | 0.60 | ||
black | 8 (42%) | 3 (23%) | |
red | 5 (26%) | 4 (31%) | |
silver | 6 (32%) | 6 (46%) |
By popular demand I’ve expanded with the option of having custom
p-values. All you need to do is to provide a function that takes two
values and exports a single p-value. There are several prepared
functions that you can use or use as a template for your own p-value
function. They all start with getPval..
,
e.g. getPvalKruskal
. You can either provide a single
function or you can set the defaults depending on the variable type:
mtcars %>%
getDescriptionStatsBy(mpg,
wt,
am,
gear,
col,
by = am,
continuous_fn = describeMedian,
digits = 0,
header_count = TRUE,
statistics = list(continuous = getPvalChiSq,
factor = getPvalChiSq,
proportion = getPvalFisher)) %>%
addHtmlTableStyle(pos.caption = "bottom") %>%
htmlTable(caption = "P-values generated from a custom set of values")
Automatic No. 19 |
Manual No. 13 |
P-value | |
---|---|---|---|
Gas | 17 (15 - 19) | 23 (21 - 30) | 0.27 |
Weight | 4 (3 - 4) | 2 (2 - 3) | 0.37 |
Transmission | 19 (100%) | 0 (0%) | < 0.0001 |
Gears | < 0.0001 | ||
3 | 15 (79%) | 0 (0%) | |
4 | 4 (21%) | 8 (62%) | |
5 | 0 (0%) | 5 (38%) | |
Car color | 0.52 | ||
black | 8 (42%) | 3 (23%) | |
red | 5 (26%) | 4 (31%) | |
silver | 6 (32%) | 6 (46%) | |
P-values generated from a custom set of values |
mergeDesc
Prior to Gmisc v3.0 mergeDesc
was the best way to
quickly assemble a “Table 1”:
getTable1Stats <- function(x, digits = 0, ...){
getDescriptionStatsBy(x = x,
by = mtcars$am,
digits = digits,
continuous_fn = describeMedian,
header_count = TRUE,
...)
}
t1 <- list()
t1[["Gas"]] <-
getTable1Stats(mtcars$mpg)
t1[["Weight†"]] <-
getTable1Stats(mtcars$wt)
t1[["Color"]] <-
getTable1Stats(mtcars$col)
# If we want to use the labels set in the beginning
# we add an element without a name
t1 <- c(t1,
list(getTable1Stats(mtcars$gear)))
mergeDesc(t1,
htmlTable_args = list(caption = "Basic descriptive statistics from the mtcars dataset",
tfoot = "† The weight is in 10<sup>3</sup> kg"))
Basic descriptive statistics from the mtcars dataset | ||
Automatic No. 19 |
Manual No. 13 |
|
---|---|---|
Gas | 17 (15 - 19) | 23 (21 - 30) |
Weight† | 4 (3 - 4) | 2 (2 - 3) |
Color | ||
black | 8 (42%) | 3 (23%) |
red | 5 (26%) | 4 (31%) |
silver | 6 (32%) | 6 (46%) |
Gears | ||
3 | 15 (79%) | 0 (0%) |
4 | 4 (21%) | 8 (62%) |
5 | 0 (0%) | 5 (38%) |
† The weight is in 103 kg |