Introduction
In this vignette, we demonstrate how to use the bkmr package in R to analyze the relationship between metal exposures (e.g., Cadmium, Mercury, Arsenic, Lead, and Manganese) and a response variable (e.g., IQ score or QI). Additionally, we will compute a dynamic threshold for model inclusion based on the coefficient of variation and the sample size.
1. Required Libraries
Before starting, ensure the following libraries are installed and loaded:
# Load the necessary libraries
library(bkmr)
For guided examples, go to 'https://jenfb.github.io/bkmr/overview.html'
library(simBKMRdata)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
2. Data Preprocessing
First, we load the data, clean it, and perform any necessary transformations. In this case, we take the log-transformation of the metal concentrations to ensure they are on a comparable scale.
# Set the seed for reproducibility
set.seed(2025)
# Load the dataset and preprocess
data("metalExposChildren_df")
bkmrAnalysisData_df <-
metalExposChildren_df %>%
select(QI, Cadmium:Manganese) %>% # Selecting relevant columns
na.omit() %>% # Remove rows with missing values
mutate_at(
vars(Cadmium:Manganese), ~ log10(. + 1) %>% as.vector
) # Log-transform metal concentrations
Here, mutate_at
is used to log-transform the exposure variables (Cadmium to Manganese) by applying log10(. + 1)
.
3. Exploratory Data Analysis (EDA)
Before fitting the model, we perform some exploratory data analysis. We visualize the distribution of metal concentrations using histograms and density plots.
# Create a histogram with density plot for each metal
metalDataPlot <- bkmrAnalysisData_df %>%
select(Cadmium:Manganese) %>%
pivot_longer(cols = everything(), names_to = "Metal", values_to = "Value") %>%
ggplot(aes(x = Value, fill = Metal)) +
geom_histogram(aes(y = ..density..), bins = 30, alpha = 0.5, position = "identity") +
geom_density(aes(color = Metal), linewidth = 1) +
facet_wrap(~Metal, scales = "free") +
theme_minimal() +
labs(title = "Histogram with Density Plot for Metals",
x = "Concentration",
y = "Density") +
theme(legend.position = "none")
# Display the plot
metalDataPlot
Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(density)` instead.
This plot provides a visual understanding of how the metal concentrations are distributed.
4. Model Setup
We now set up the Bayesian Kernel Machine Regression (BKMR) model by creating a response variable (QI) and the exposure matrix, which includes all the metal concentrations.
# Extract the response variable (IQ score or QI)
iq <- bkmrAnalysisData_df %>%
pull(QI) %>%
na.omit()
# Convert exposure variables (metals) to a matrix for modeling
expos <- data.matrix(bkmrAnalysisData_df %>% select(Cadmium:Manganese))
5. Generating Knot Points for the Model
To fit the Bayesian Kernel Machine Regression model, we generate knot points using the cover.design function. These points will be used for the MCMC model.
# Generate knot points using a cover design for Bayesian modeling
knots50 <- fields::cover.design(expos, nd = 50)$design
Here, we specify 50 knots for the cover design. These knot points help in fitting the non-linear kernel in BKMR.
6. Fitting the BKMR Model
We now fit the BKMR model using the kmbayes() function. This function performs Bayesian regression using MCMC (Markov Chain Monte Carlo) to estimate the relationship between the exposures and the response variable.
# Fit the BKMR model using MCMC
modelFit <- kmbayes(
y = iq, # Response variable
Z = expos, # Exposure matrix (metal concentrations)
X = NULL, # No additional covariates
iter = 1000, # Number of MCMC iterations
family = "gaussian", # Gaussian response
verbose = TRUE, # Output progress for each iteration
varsel = TRUE, # Perform variable selection
knots = knots50 # Knot points generated earlier
)
Iteration: 100 (10% completed; 0.94368 secs elapsed)
Acceptance rates for Metropolis-Hastings algorithm:
param rate
1 lambda 0.4545455
2 r/delta (overall) 0.5656566
3 r/delta (move 1) 0.5600000
4 r/delta (move 2) 0.5714286
Iteration: 200 (20% completed; 1.89533 secs elapsed)
Acceptance rates for Metropolis-Hastings algorithm:
param rate
1 lambda 0.4824121
2 r/delta (overall) 0.5427136
3 r/delta (move 1) 0.5416667
4 r/delta (move 2) 0.5436893
Iteration: 300 (30% completed; 2.86038 secs elapsed)
Acceptance rates for Metropolis-Hastings algorithm:
param rate
1 lambda 0.4548495
2 r/delta (overall) 0.5418060
3 r/delta (move 1) 0.5035971
4 r/delta (move 2) 0.5750000
Iteration: 400 (40% completed; 3.74415 secs elapsed)
Acceptance rates for Metropolis-Hastings algorithm:
param rate
1 lambda 0.4736842
2 r/delta (overall) 0.5137845
3 r/delta (move 1) 0.4111675
4 r/delta (move 2) 0.6138614
Iteration: 500 (50% completed; 4.70124 secs elapsed)
Acceptance rates for Metropolis-Hastings algorithm:
param rate
1 lambda 0.5110220
2 r/delta (overall) 0.4849699
3 r/delta (move 1) 0.4143426
4 r/delta (move 2) 0.5564516
Iteration: 600 (60% completed; 5.59558 secs elapsed)
Acceptance rates for Metropolis-Hastings algorithm:
param rate
1 lambda 0.5308848
2 r/delta (overall) 0.4691152
3 r/delta (move 1) 0.4140127
4 r/delta (move 2) 0.5298246
Iteration: 700 (70% completed; 6.54197 secs elapsed)
Acceptance rates for Metropolis-Hastings algorithm:
param rate
1 lambda 0.5393419
2 r/delta (overall) 0.4520744
3 r/delta (move 1) 0.4141689
4 r/delta (move 2) 0.4939759
Iteration: 800 (80% completed; 7.40428 secs elapsed)
Acceptance rates for Metropolis-Hastings algorithm:
param rate
1 lambda 0.5319149
2 r/delta (overall) 0.4505632
3 r/delta (move 1) 0.4268868
4 r/delta (move 2) 0.4773333
Iteration: 900 (90% completed; 8.34628 secs elapsed)
Acceptance rates for Metropolis-Hastings algorithm:
param rate
1 lambda 0.5283648
2 r/delta (overall) 0.4271413
3 r/delta (move 1) 0.4078675
4 r/delta (move 2) 0.4495192
Iteration: 1000 (100% completed; 9.21356 secs elapsed)
Acceptance rates for Metropolis-Hastings algorithm:
param rate
1 lambda 0.5395395
2 r/delta (overall) 0.4184184
3 r/delta (move 1) 0.4118774
4 r/delta (move 2) 0.4255765
After the model is fitted, we extract the posterior inclusion probabilities (PIPs) to determine which exposures are most important in predicting the response variable.
# Extract posterior inclusion probabilities (PIPs) and sort them
pipFit <- ExtractPIPs(modelFit) %>%
arrange(desc(PIP))
# Display the PIPs
pipFit
variable PIP
1 Cadmium 0.924
2 Lead 0.694
3 Mercury 0.350
4 Arsenic 0.248
5 Manganese 0.244
The ExtractPIPs()
function returns the posterior inclusion probabilities, which are then sorted in descending order to identify the most important exposures.
- Dynamic Threshold Calculation Next, we calculate the dynamic threshold for model inclusion. This threshold is based on the coefficient of variation (CV) of the response variable (QI) and the sample size.
# Calculate the dynamic threshold for model inclusion
pipThresh_fn <- calculate_pip_threshold(
absCV = sd(bkmrAnalysisData_df$QI) / mean(bkmrAnalysisData_df$QI), # Coefficient of variation
sampSize = length(bkmrAnalysisData_df$QI) # Sample size
)
# Display the threshold
pipThresh_fn
This function computes the threshold for the PIPs using the coefficient of variation of the response variable and the sample size.
9. Interpretation of Results
Finally, we compare the PIPs with the threshold to identify the variables (metals) that have a significant effect on the response variable.
# Identify exposures with PIPs greater than the threshold
significant_exposures <- pipFit %>%
filter(PIP > pipThresh_fn)
# Display significant exposures
significant_exposures
variable PIP
1 Cadmium 0.924
2 Lead 0.694
3 Mercury 0.350
4 Arsenic 0.248
5 Manganese 0.244
This step filters the exposures whose posterior inclusion probabilities exceed the dynamic threshold, indicating that these variables are significant predictors of the outcome.
Conclusion
This vignette demonstrates the complete workflow for using Bayesian Kernel Machine Regression (BKMR) to analyze the relationship between various metal exposures and an outcome of interest (e.g., IQ score). Additionally, we computed a dynamic threshold for model inclusion, which adjusts the threshold based on the coefficient of variation and the sample size.