Vignette: Bayesian Kernel Machine Regression (BKMR) for Metal Exposure Analysis with Dynamic Threshold Function

Kazi Tanvir Hasan and Dr. Gabriel Odom

Introduction

In this vignette, we demonstrate how to use the bkmr package in R to analyze the relationship between metal exposures (e.g., Cadmium, Mercury, Arsenic, Lead, and Manganese) and a response variable (e.g., IQ score or QI). Additionally, we will compute a dynamic threshold for model inclusion based on the coefficient of variation and the sample size.

1. Required Libraries

Before starting, ensure the following libraries are installed and loaded:

# Load the necessary libraries
library(bkmr)
For guided examples, go to 'https://jenfb.github.io/bkmr/overview.html'
library(simBKMRdata)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

2. Data Preprocessing

First, we load the data, clean it, and perform any necessary transformations. In this case, we take the log-transformation of the metal concentrations to ensure they are on a comparable scale.

# Set the seed for reproducibility
set.seed(2025)

# Load the dataset and preprocess
data("metalExposChildren_df")

bkmrAnalysisData_df <- 
  metalExposChildren_df %>%
  select(QI, Cadmium:Manganese) %>%  # Selecting relevant columns
  na.omit() %>%  # Remove rows with missing values
  mutate_at(
    vars(Cadmium:Manganese), ~ log10(. + 1) %>% as.vector
  )  # Log-transform metal concentrations

Here, mutate_at is used to log-transform the exposure variables (Cadmium to Manganese) by applying log10(. + 1).

3. Exploratory Data Analysis (EDA)

Before fitting the model, we perform some exploratory data analysis. We visualize the distribution of metal concentrations using histograms and density plots.

# Create a histogram with density plot for each metal
metalDataPlot <- bkmrAnalysisData_df %>%
  select(Cadmium:Manganese) %>%
  pivot_longer(cols = everything(), names_to = "Metal", values_to = "Value") %>%
  ggplot(aes(x = Value, fill = Metal)) + 
  geom_histogram(aes(y = ..density..), bins = 30, alpha = 0.5, position = "identity") + 
  geom_density(aes(color = Metal), linewidth = 1) + 
  facet_wrap(~Metal, scales = "free") + 
  theme_minimal() + 
  labs(title = "Histogram with Density Plot for Metals", 
       x = "Concentration", 
       y = "Density") + 
  theme(legend.position = "none")

# Display the plot
metalDataPlot
Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(density)` instead.

This plot provides a visual understanding of how the metal concentrations are distributed.

4. Model Setup

We now set up the Bayesian Kernel Machine Regression (BKMR) model by creating a response variable (QI) and the exposure matrix, which includes all the metal concentrations.

# Extract the response variable (IQ score or QI)
iq <- bkmrAnalysisData_df %>%
  pull(QI) %>%
  na.omit()

# Convert exposure variables (metals) to a matrix for modeling
expos <- data.matrix(bkmrAnalysisData_df %>% select(Cadmium:Manganese))

5. Generating Knot Points for the Model

To fit the Bayesian Kernel Machine Regression model, we generate knot points using the cover.design function. These points will be used for the MCMC model.

# Generate knot points using a cover design for Bayesian modeling
knots50 <- fields::cover.design(expos, nd = 50)$design

Here, we specify 50 knots for the cover design. These knot points help in fitting the non-linear kernel in BKMR.

6. Fitting the BKMR Model

We now fit the BKMR model using the kmbayes() function. This function performs Bayesian regression using MCMC (Markov Chain Monte Carlo) to estimate the relationship between the exposures and the response variable.

# Fit the BKMR model using MCMC
modelFit <- kmbayes(
  y = iq,         # Response variable
  Z = expos,      # Exposure matrix (metal concentrations)
  X = NULL,       # No additional covariates
  iter = 1000,    # Number of MCMC iterations
  family = "gaussian",  # Gaussian response
  verbose = TRUE,       # Output progress for each iteration
  varsel = TRUE,       # Perform variable selection
  knots = knots50      # Knot points generated earlier
)
Iteration: 100 (10% completed; 0.94368 secs elapsed)
Acceptance rates for Metropolis-Hastings algorithm:
              param      rate
1            lambda 0.4545455
2 r/delta (overall) 0.5656566
3 r/delta  (move 1) 0.5600000
4 r/delta  (move 2) 0.5714286
Iteration: 200 (20% completed; 1.89533 secs elapsed)
Acceptance rates for Metropolis-Hastings algorithm:
              param      rate
1            lambda 0.4824121
2 r/delta (overall) 0.5427136
3 r/delta  (move 1) 0.5416667
4 r/delta  (move 2) 0.5436893
Iteration: 300 (30% completed; 2.86038 secs elapsed)
Acceptance rates for Metropolis-Hastings algorithm:
              param      rate
1            lambda 0.4548495
2 r/delta (overall) 0.5418060
3 r/delta  (move 1) 0.5035971
4 r/delta  (move 2) 0.5750000
Iteration: 400 (40% completed; 3.74415 secs elapsed)
Acceptance rates for Metropolis-Hastings algorithm:
              param      rate
1            lambda 0.4736842
2 r/delta (overall) 0.5137845
3 r/delta  (move 1) 0.4111675
4 r/delta  (move 2) 0.6138614
Iteration: 500 (50% completed; 4.70124 secs elapsed)
Acceptance rates for Metropolis-Hastings algorithm:
              param      rate
1            lambda 0.5110220
2 r/delta (overall) 0.4849699
3 r/delta  (move 1) 0.4143426
4 r/delta  (move 2) 0.5564516
Iteration: 600 (60% completed; 5.59558 secs elapsed)
Acceptance rates for Metropolis-Hastings algorithm:
              param      rate
1            lambda 0.5308848
2 r/delta (overall) 0.4691152
3 r/delta  (move 1) 0.4140127
4 r/delta  (move 2) 0.5298246
Iteration: 700 (70% completed; 6.54197 secs elapsed)
Acceptance rates for Metropolis-Hastings algorithm:
              param      rate
1            lambda 0.5393419
2 r/delta (overall) 0.4520744
3 r/delta  (move 1) 0.4141689
4 r/delta  (move 2) 0.4939759
Iteration: 800 (80% completed; 7.40428 secs elapsed)
Acceptance rates for Metropolis-Hastings algorithm:
              param      rate
1            lambda 0.5319149
2 r/delta (overall) 0.4505632
3 r/delta  (move 1) 0.4268868
4 r/delta  (move 2) 0.4773333
Iteration: 900 (90% completed; 8.34628 secs elapsed)
Acceptance rates for Metropolis-Hastings algorithm:
              param      rate
1            lambda 0.5283648
2 r/delta (overall) 0.4271413
3 r/delta  (move 1) 0.4078675
4 r/delta  (move 2) 0.4495192
Iteration: 1000 (100% completed; 9.21356 secs elapsed)
Acceptance rates for Metropolis-Hastings algorithm:
              param      rate
1            lambda 0.5395395
2 r/delta (overall) 0.4184184
3 r/delta  (move 1) 0.4118774
4 r/delta  (move 2) 0.4255765

7. Extracting Results

After the model is fitted, we extract the posterior inclusion probabilities (PIPs) to determine which exposures are most important in predicting the response variable.

# Extract posterior inclusion probabilities (PIPs) and sort them
pipFit <- ExtractPIPs(modelFit) %>%
  arrange(desc(PIP))

# Display the PIPs
pipFit
   variable   PIP
1   Cadmium 0.924
2      Lead 0.694
3   Mercury 0.350
4   Arsenic 0.248
5 Manganese 0.244

The ExtractPIPs() function returns the posterior inclusion probabilities, which are then sorted in descending order to identify the most important exposures.

  1. Dynamic Threshold Calculation Next, we calculate the dynamic threshold for model inclusion. This threshold is based on the coefficient of variation (CV) of the response variable (QI) and the sample size.
# Calculate the dynamic threshold for model inclusion
pipThresh_fn <- calculate_pip_threshold(
  absCV = sd(bkmrAnalysisData_df$QI) / mean(bkmrAnalysisData_df$QI),  # Coefficient of variation
  sampSize = length(bkmrAnalysisData_df$QI)  # Sample size
)

# Display the threshold
pipThresh_fn
[1] 0.1214094

This function computes the threshold for the PIPs using the coefficient of variation of the response variable and the sample size.

9. Interpretation of Results

Finally, we compare the PIPs with the threshold to identify the variables (metals) that have a significant effect on the response variable.

# Identify exposures with PIPs greater than the threshold
significant_exposures <- pipFit %>%
  filter(PIP > pipThresh_fn)

# Display significant exposures
significant_exposures
   variable   PIP
1   Cadmium 0.924
2      Lead 0.694
3   Mercury 0.350
4   Arsenic 0.248
5 Manganese 0.244

This step filters the exposures whose posterior inclusion probabilities exceed the dynamic threshold, indicating that these variables are significant predictors of the outcome.

Conclusion

This vignette demonstrates the complete workflow for using Bayesian Kernel Machine Regression (BKMR) to analyze the relationship between various metal exposures and an outcome of interest (e.g., IQ score). Additionally, we computed a dynamic threshold for model inclusion, which adjusts the threshold based on the coefficient of variation and the sample size.