CondiS is an R package that imputes survival time for censored observations. It allows the direct application of standard machine learning techniques for regression modeling once the imputed survival time is obtained. This vignette shows the use of CondiS package and introduce the things CondiS can do for you. CondiS was created by Yizhuo Wang, Xuelin Huang, Ziyi Li and Christopher R. Flowers, and is now maintained by Yizhuo Wang.
Install CondiS using the code below to to ensure that all the needed packages are installed.
# install.packages("CondiS", dependencies = c("survival", "caret"))
library(CondiS)
CondiS has two functions to help impute the survival times as much alike as true survival times for the censored observations. A built-in R dataset in the survival package, rotterdam, is used here to demonstrate the usages of these two functions.
The imputed survival times for censored observations are generated based on their conditional survival distributions derived from the Kaplan-Meier estimator. Below are the input parameters of the CondiS function:
library(kernlab)
library(purrr)
library(tidyverse)
library(survival)
data(cancer, package="survival")
<- pmax(rotterdam$recur, rotterdam$death)
status <- with(rotterdam, ifelse(recur==1, rtime, dtime))
rfstime <- rotterdam[2:11]
rotterdam $status = status
rotterdam$rfstime = rfstime
rotterdam<- survfit(Surv(rfstime, status) ~ 1, data = rotterdam)
fit
# Obtain the imputed survival time
= CondiS(rfstime, status)
pred_time $pred_time = pred_time
rotterdam$status2 = rep(1,length(status))
rotterdam<- survfit(Surv(pred_time, status2) ~ 1, data = rotterdam)
fit_2
# Visualization
library(survminer)
<-
combined list(Censored = fit,
CondiS = fit_2)
ggsurvplot(
combined,data = rotterdam,
combine = TRUE,
censor = TRUE,
risk.table = TRUE,
palette = "jco"
)
The imputed survival times are further improved by incorporating the covariate information through machine learning modeling (CondiS-X). Below are the input parameters of the CondiS-X function:
= rotterdam[,1:10]
covariates
# Update the imputed survival time
= CondiS_X(pred_time, status, covariates)
pred_time_2 #> Loading required package: lattice
#>
#> Attaching package: 'caret'
#> The following object is masked from 'package:survival':
#>
#> cluster
#> The following object is masked from 'package:purrr':
#>
#> lift
$pred_time_2 = pred_time_2 rotterdam
# Pre-process the data
library(caret)
<- preProcess(rotterdam[,1:10], method = c('center', 'scale'))
preproc <- predict(preproc, rotterdam[,1:10])
trainPreProc
<- trainControl(method = "repeatedcv")
train_control
# Train-test split
set.seed(42)
<- floor(0.75 * nrow(rotterdam))
smp_size <- sample(seq_len(nrow(rotterdam)), size = smp_size)
train_ind
<- rotterdam[train_ind, ]
train <- rotterdam[-train_ind, ]
test
= train(
fit_svm ~ .-status-status2-rfstime-pred_time,
pred_time data = train,
method = "svmRadial",
trControl = train_control,
na.action = na.omit
)
= predict(fit_svm, test)
pred_svm
# Mean absolute error (MAE)
<- function(actual,predicted)
calc_MAE
{<- actual - predicted
error mean(abs(error))
}
## In the testing set:
# The MAE of CondiS-imputed survival time and SVM-predicted survival time is:
calc_MAE(test$pred_time,pred_svm)
#> [1] 226.0307
# The MAE of the CondiS-X-imputed survival time and the SVM-predicted survival time is:
calc_MAE(test$pred_time_2,pred_svm)
#> [1] 181.4269