In a previous vignette, we introduced the “marginal effect” as a partial derivative. Since derivatives are only properly defined for continuous variables, we cannot use them to interpret the effects of changes in categorical variables. For this, we turn to contrasts between Adjusted predictions. In the context of this package, a “Contrast” is defined as:
The difference between two adjusted predictions, calculated for meaningfully different regressor values (e.g., College graduates vs. Others).
The marginaleffects()
function automatically calculates contrasts instead of derivatives for factor, logical, or character variables. The comparisons()
function gives users more powerful features to compute different contrasts.
Consider a simple model with a logical and a factor variable:
library(marginaleffects)
library(magrittr)
tmp <- mtcars
tmp$am <- as.logical(tmp$am)
mod <- lm(mpg ~ am + factor(cyl), tmp)
The marginaleffects
function automatically computes contrasts for each level of the categorical variables, relative to the baseline category (FALSE
for logicals, and the reference level for factors), while holding all other values at their mode or mean:
mfx <- marginaleffects(mod)
summary(mfx)
#> Average marginal effects
#> Term Contrast Effect Std. Error z value Pr(>|z|) 2.5 % 97.5 %
#> 1 am TRUE - FALSE 2.560 1.298 1.973 0.04851 0.01675 5.103
#> 2 cyl 6 - 4 -6.156 1.536 -4.009 6.1077e-05 -9.16608 -3.146
#> 3 cyl 8 - 4 -10.068 1.452 -6.933 4.1147e-12 -12.91359 -7.222
#>
#> Model type: lm
#> Prediction type: response
The summary printed above says that moving from the reference category 4
to the level 6
on the cyl
factor variable is associated with a change of -6.156 in the adjusted prediction. Similarly, the contrast from FALSE
to TRUE
on the am
variable is equal to 2.560.
We can obtain different contrasts by using the comparisons()
function. For example:
comparisons(mod, contrast_factor = "sequential") %>% tidy()
#> term contrast estimate std.error statistic p.value conf.low
#> 1 am TRUE - FALSE 2.559954 1.297579 1.972869 4.851044e-02 0.01674586
#> 2 cyl 6 - 4 -6.156118 1.535723 -4.008612 6.107658e-05 -9.16607927
#> 3 cyl 8 - 6 -3.911442 1.470254 -2.660385 7.805144e-03 -6.79308707
#> conf.high
#> 1 5.103162
#> 2 -3.146156
#> 3 -1.029797
comparisons(mod, contrast_factor = "pairwise") %>% tidy()
#> term contrast estimate std.error statistic p.value conf.low
#> 1 am TRUE - FALSE 2.559954 1.297579 1.972869 4.851044e-02 0.01674586
#> 2 cyl 6 - 4 -6.156118 1.535723 -4.008612 6.107658e-05 -9.16607927
#> 3 cyl 8 - 4 -10.067560 1.452082 -6.933187 4.114709e-12 -12.91358877
#> 4 cyl 8 - 6 -3.911442 1.470254 -2.660385 7.805144e-03 -6.79308707
#> conf.high
#> 1 5.103162
#> 2 -3.146156
#> 3 -7.221530
#> 4 -1.029797
comparisons(mod, contrast_factor = "reference") %>% tidy()
#> term contrast estimate std.error statistic p.value conf.low
#> 1 am TRUE - FALSE 2.559954 1.297579 1.972869 4.851044e-02 0.01674586
#> 2 cyl 6 - 4 -6.156118 1.535723 -4.008612 6.107658e-05 -9.16607927
#> 3 cyl 8 - 4 -10.067560 1.452082 -6.933187 4.114709e-12 -12.91358877
#> conf.high
#> 1 5.103162
#> 2 -3.146156
#> 3 -7.221530
For comparison, this code produces the same results using the emmeans
package:
library(emmeans)
emm <- emmeans(mod, specs = "cyl")
contrast(emm, method = "revpairwise")
#> contrast estimate SE df t.ratio p.value
#> 6 - 4 -6.16 1.54 28 -4.009 0.0012
#> 8 - 4 -10.07 1.45 28 -6.933 <.0001
#> 8 - 6 -3.91 1.47 28 -2.660 0.0331
#>
#> Results are averaged over the levels of: am
#> P value adjustment: tukey method for comparing a family of 3 estimates
emm <- emmeans(mod, specs = "am")
contrast(emm, method = "revpairwise")
#> contrast estimate SE df t.ratio p.value
#> TRUE - FALSE 2.56 1.3 28 1.973 0.0585
#>
#> Results are averaged over the levels of: cyl
Note that these commands also work on for other types of models, such as GLMs, on different scales:
mod_logit <- glm(am ~ factor(gear), data = mtcars, family = binomial)
comparisons(mod_logit) %>% tidy()
#> term contrast estimate std.error statistic p.value conf.low
#> 1 gear 4 - 3 0.6666667 1.360805e-01 4.899061 9.629569e-07 0.3999538
#> 2 gear 5 - 3 1.0000000 1.071403e-05 93335.529606 0.000000e+00 0.9999790
#> conf.high
#> 1 0.9333795
#> 2 1.0000210
comparisons(mod_logit, type = "link") %>% tidy()
#> term contrast estimate std.error statistic p.value conf.low conf.high
#> 1 gear 4 - 3 21.25922 4577.962 0.004643817 0.9962948 -8951.381 8993.90
#> 2 gear 5 - 3 41.13214 9155.924 0.004492407 0.9964156 -17904.148 17986.41
We can also compute contrasts for differences in numeric variables. For example, we can see what happens to the adjusted predictions when we increment the hp
variable by 1 unit (default) or by 5 units:
mod <- lm(mpg ~ hp, data = mtcars)
comparisons(mod) %>% tidy()
#> term contrast estimate std.error statistic p.value conf.low
#> 1 hp +1 -0.06822828 0.0101193 -6.742389 1.558043e-11 -0.08806175
#> conf.high
#> 1 -0.04839481
comparisons(mod, contrast_numeric = 5) %>% tidy()
#> term contrast estimate std.error statistic p.value conf.low
#> 1 hp +5 -0.3411414 0.05059652 -6.742389 1.558043e-11 -0.4403087
#> conf.high
#> 1 -0.241974
Compare adjusted predictions for a change in the regressor between two arbitrary values:
comparisons(mod, contrast_numeric = c(90, 110)) %>% tidy()
#> term contrast estimate std.error statistic p.value conf.low conf.high
#> 1 hp 110 - 90 -1.364566 0.2023861 -6.742389 1.558043e-11 -1.761235 -0.9678961
Compare adjusted predictions when the regressor changes across the interquartile range, across one or two standard deviations about its mean, or from across its full range:
comparisons(mod, contrast_numeric = "iqr") %>% tidy()
#> term contrast estimate std.error statistic p.value conf.low conf.high
#> 1 hp IQR -5.697061 0.8449619 -6.742389 1.558043e-11 -7.353156 -4.040966
comparisons(mod, contrast_numeric = "sd") %>% tidy()
#> term contrast estimate std.error statistic p.value conf.low conf.high
#> 1 hp sd -4.677926 0.6938085 -6.742389 1.558043e-11 -6.037766 -3.318087
comparisons(mod, contrast_numeric = "2sd") %>% tidy()
#> term contrast estimate std.error statistic p.value conf.low conf.high
#> 1 hp 2sd -9.355853 1.387617 -6.742389 1.558043e-11 -12.07553 -6.636174
comparisons(mod, contrast_numeric = "minmax") %>% tidy()
#> term contrast estimate std.error statistic p.value conf.low conf.high
#> 1 hp Max - Min -19.3086 2.863763 -6.742389 1.558043e-11 -24.92147 -13.69573
In models with multiplicative interactions, the contrasts of a categorical variable will depend on the values of the interacted variable:
We can now use the newdata
argument of the marginaleffects
function to compute contrasts for different values of the other regressors. As in the marginal effects vignette, the datagrid
function can be handy. Since we only care about the logical am
contrast, we use the variables
to indicate the subset of results to report:
comparisons(mod_int, newdata = datagrid(cyl = tmp$cyl), variables = "am")
#> rowid term contrast comparison std.error am cyl
#> 1: 1 am TRUE - FALSE 1.441667 2.315925 FALSE 6
#> 2: 2 am TRUE - FALSE 5.175000 2.052848 FALSE 4
#> 3: 3 am TRUE - FALSE 0.350000 2.315925 FALSE 8
Once again, we obtain the same results with emmeans
:
emm <- emmeans(mod_int, specs = "am", by = "cyl")
contrast(emm, method = "revpairwise")
#> cyl = 4:
#> contrast estimate SE df t.ratio p.value
#> TRUE - FALSE 5.17 2.05 26 2.521 0.0182
#>
#> cyl = 6:
#> contrast estimate SE df t.ratio p.value
#> TRUE - FALSE 1.44 2.32 26 0.623 0.5390
#>
#> cyl = 8:
#> contrast estimate SE df t.ratio p.value
#> TRUE - FALSE 0.35 2.32 26 0.151 0.8810
As described above, the marginaleffects
package includes limited support to compute contrasts. Users who require more powerful features are encouraged to consider alternative packages such as emmeans, modelbased, or ggeffects. These packages offer useful features such as automatic back-transforms, p value correction for multiple comparisons, and more.