This example demonstrates how to use the breakDown
package for models created with the caret package.
First we will generate some data.
library(caret)
set.seed(2)
training <- twoClassSim(50, linearVars = 2)
trainX <- training[, -ncol(training)]
trainY <- training$Class
head(training)
#> TwoFactor1 TwoFactor2 Linear1 Linear2 Nonlinear1 Nonlinear2 Nonlinear3
#> 1 -0.6561702 -1.6480450 1.0744594 0.9758906 0.2342843 0.6805653 0.6920055
#> 2 -0.9849973 1.4598834 0.2605978 -0.1694232 0.1381283 0.7460168 0.5599569
#> 3 2.3722541 1.7069944 -0.3142720 0.7221918 -0.6920591 0.4642024 0.3426912
#> 4 -2.2067173 -0.6972704 -0.7496301 -0.8444186 -0.9303336 0.1374181 0.2344975
#> 5 0.5166671 -0.7228376 -0.8621983 1.2772937 0.9959069 0.8143796 0.4296028
#> 6 1.3331262 -0.9929323 2.0480403 -1.3431105 0.6711474 0.8321613 0.7367007
#> Class
#> 1 Class1
#> 2 Class2
#> 3 Class1
#> 4 Class2
#> 5 Class1
#> 6 Class1
Now we are ready to train a model. Let’s train a glm
model with caret
.
cctrl1 <- trainControl(method = "cv", number = 3, returnResamp = "all",
classProbs = TRUE,
summaryFunction = twoClassSummary)
test_class_cv_model <- train(trainX, trainY,
method = "glm",
trControl = cctrl1,
metric = "ROC",
preProc = c("center", "scale"))
test_class_cv_model
#> Generalized Linear Model
#>
#> 50 samples
#> 7 predictor
#> 2 classes: 'Class1', 'Class2'
#>
#> Pre-processing: centered (7), scaled (7)
#> Resampling: Cross-Validated (3 fold)
#> Summary of sample sizes: 33, 34, 33
#> Resampling results:
#>
#> ROC Sens Spec
#> 0.7771991 0.7175926 0.8009259
To use breakDown
we need a function that will calculate
scores/predictions for a single observation. By default the
predict()
function returns predicted class.
So we are adding type = "prob"
argument to get scores.
And since there will be two scores for each observarion we need to
extract one of them.
predict.fun <- function(model, x) predict(model, x, type = "prob")[,1]
testing <- twoClassSim(10, linearVars = 2)
predict.fun(test_class_cv_model, testing[1,])
#> [1] 0.9807632
Now we are ready to call the broken()
function.
library("breakDown")
explain_2 <- broken(test_class_cv_model, testing[1,], data = trainX, predict.function = predict.fun)
explain_2
#> contribution
#> (Intercept) 0.500
#> + TwoFactor2 = -2.15297519239414 0.330
#> + Linear2 = 1.21347759171666 0.103
#> + Nonlinear2 = 0.938861106755212 0.037
#> + Nonlinear3 = 0.198311409447342 0.016
#> + Linear1 = -1.59104698624311 0.006
#> + Nonlinear1 = -0.693807001691312 -0.001
#> + TwoFactor1 = -1.5957842151878 -0.009
#> final_prognosis 0.981
#> baseline: 0
And plot it.