lincom
implements linear combination methods for
biomarkers via empirical performance optimization with respect to two
performance metrics: (1) specificity at controlled sensitivity (or
sensitivity at controlled specificity) (Huang and Sanda, 2022), and (2)
weighted average of false positive rate and false negative rate. The
second method is a variant of the maximum score estimator (Manski, 1975,
1985). In both cases, the algorithm of Huang and Sanda (2022) is used to
provide a solution that balances between computational efficiency and
quality.
lincom
is available on CRAN:
‘MOSEK’ solver is used and needs to be installed; an academic license for ‘MOSEK’ is free.
library(lincom)
## simulate 3 biomarkers for 100 cases and 100 controls
n1 <- 100
n0 <- 100
mk <- rbind(matrix(rnorm(3*n1),ncol=3),matrix(rnorm(3*n0),ncol=3))
mk[1:n1,1] <- mk[1:n1,1]/sqrt(2)+1
mk[1:n1,2] <- mk[1:n1,2]*sqrt(2)+1
mk[1:n1,]
and mk[(n1+1):(n1+n0),]
contain
the case and control data, respectively.
The following code performs empirical maximization of specificity at 95% sensitivity.
## The following two lines are commented out - require installation of 'MOSEK' to run
#lcom1 <- eum(mk, n1=n1, s0=0.95, grdpt=0)
#lcom1
Above, n1
is the case size, s0
is the
control level, and grdpt
specifies how initial value of the
optimization is obtained (logistic regression if grdpt=0
,
and coarse grid search with grdpt
grid points otherwise).
Additional arguments include fixsens
(fixing sensitivity if
TRUE
and specificity otherwise), and lbmdis
(larger biomarker values is more associated with cases if
TRUE
and controls otherwise).
The outputs include the resulting combination coefficient
(coef
), maximum empirical value of the performance metric
(hs
), and the resulting threshold (threshold
),
along with their initial value counterparts (from logistic regression or
coarse grid search).
## default relative weight r=1.
## Require installation of 'MOSEK' to run
## The following two lines are commented out - require installation of 'MOSEK' to run
#lcom2 <- wmse(mk, n1=n1)
#lcom2
The inputs and outputs are similar to those of eum
.
However, the initial value here is obtained through logistic regression
only.
With cohort design, setting r=n0/n1
leads to Manski’s
original estimator.
Huang, Y. and Sanda, M. G. (2022). Linear biomarker combination for constrained classification. The Annals of Statistics 50, 2793–2815.
Manski, C. F. (1975). Maximum score estimation of the stochastic utility model of choice. Journal of Econometrics 3, 205–228.
Manski, C. F. (1985). Semiparametric analysis of discrete response. Asymptotic properties of the maximum score estimator. Journal of Econometrics 27, 313–333.