The km.support
function is part of the conf package. The
function calculates the the support values for Kaplan and Meier’s
product–limit estimator1. The Kaplan-Meier product-limit estimator
(KMPLE) is used to estimate the survivor function for a data set of
positive values in the presence of right censoring, and the support
values are all possible values of the KMPLE for a specific sample
size.
The km.support
function finds the support values of the
KMPLE for a particular sample size \(n\) (the number of items on test) using an
induction algorithm2. The support values are returned as a list
with two components: numerators and denominators. This allows the user
to generate exact fractions.
The km.support
function is accessible following
installation of the conf
package:
install.packages("conf")
library(conf)
The KMPLE is a nonparametric estimate of the survival function from a data set of lifetimes that includes right-censored observations and is used in a variety of application areas. For simplicity, we will refer to the object of interest generically as the item and the event of interest as the failure.
Let \(n\) denote the number of items on test. The KMPLE of the survival function \(S(t)\) is given by \[ \hat{S}(t) = \prod\limits_{i:t_i \leq t}\left( 1 - \frac{d_i}{n_i}\right), \] for \(t \ge 0\), where \(t_1, \, t_2, \, \ldots, \, t_k\) are the times when at least one failure is observed (\(k\) is an integer between 1 and \(n\), which is the number of distinct failure times in the data set), \(d_1, \, d_2, \, \ldots, \, d_k\) are the number of failures observed at times \(t_1, \, t_2, \, \ldots, \, t_k\), and \(n_1, \, n_2, \, \ldots, \, n_k\) are the number of items at risk just prior to times \(t_1, \, t_2, \, \ldots, \, t_k\). It is common practice to have the KMPLE ‘’cut off’’ after the largest time recorded if it corresponds to a right-censored observation3. The KMPLE drops to zero after the largest time recorded if it is a failure; the KMPLE is undefined, however, after the largest time recorded if it is a right-censored observation.
The support values in km.support
are the calculated from
\(\hat{S}(t)\) at any \(t \ge 0\) for all possible outcomes of an
experiment with \(n\) items on test.
The function has only the sample size \(n\) as its argument.
To illustrate a simple case, consider the KMPLE for one particular experiment when there are \(n = 4\) items on test, failures occur at times \(t = 1\) and \(t = 3\), and right censorings occur at times \(t = 2\) and \(t = 4\). In this setting, the KMPLE is
\[\begin{equation*} \hat{S}(t) = \begin{cases} 1 & \qquad 0 \le t < 1 \\ \left(1 - \frac{1}{4}\right) = \frac{3}{4} & \qquad 1 \leq t < 3 \\ \left(1 - \frac{1}{4}\right) \left(1 - \frac{1}{2}\right) = \frac{3}{8} & \qquad 3 \leq t < 4 \\ \text{NA} & \qquad t \geq 4, \end{cases} \end{equation*}\]
where NA indicates that the KMPLE is undefined.
The KMPLE in this experiment has 3 of the 8 support values that are
produced by km.support
, as can be seen in the output below,
and NA. The NA’s will not be displayed in the output.
library(conf)
# display unsorted numerators and denominators of support values for n = 4
= 4
n = km.support(n)
s
s#> $num
#> [1] 0 1 1 2 1 3 3 1
#>
#> $den
#> [1] 1 1 2 3 3 4 8 4
# display sorted support values for n = 4 as decimals
sort(s$num / s$den)
#> [1] 0.0000000 0.2500000 0.3333333 0.3750000 0.5000000 0.6666667 0.7500000
#> [8] 1.0000000
# display sorted support values for n = 4 as exact fractions
<- order(s$num / s$den)
i <- length(s$num)
m <- ""
f for (j in i[2:(m - 1)]) f <- paste(f, s$num[j], "/", s$den[j], ", ", sep = "")
cat(paste("The ", m, " support values for n = ", n, " are: 0, ", f, "1.\n", sep = ""))
#> The 8 support values for n = 4 are: 0, 1/4, 1/3, 3/8, 1/2, 2/3, 3/4, 1.
Consider the another KMPLE for a different outcome of the same experiment when there are \(n = 4\) items on test. This time we observe 4 failures at times \(t = 1,2,3,4\). In this setting, the KMPLE is
\[\begin{equation*} \hat{S}(t) = \begin{cases} 1 & \qquad 0 \le t < 1 \\ \left(1 - \frac{1}{4}\right) = \frac{3}{4} & \qquad 1 \leq t < 2 \\ \left(1 - \frac{1}{4}\right) \left(1 - \frac{1}{3}\right) = \frac{1}{2} & \qquad 2 \leq t < 3 \\ \left(1 - \frac{1}{4}\right) \left(1 - \frac{1}{3}\right) \left(1 - \frac{1}{2}\right) = \frac{1}{4} & \qquad 3 \leq t < 4 \\ \text{0} & \qquad t \geq 4, \end{cases} \end{equation*}\]
The KMPLE in this experiment has 5 (3 new ones: 1/2, 1/4, and 0) of
the 8 support values that are produced by km.support
, as
can be seen in the output above. By looking at all possible combinations
of outcomes, the remaining support values will be found.
The function km.support
is also called from the
functions km.pmf
and km.surv
, which are also
part of the conf
package.
Kaplan, E. L., and Meier, P. (1958), “Nonparametric Estimation from Incomplete Observations,” Journal of the American Statistical Association, 53, 457–481.↩︎
Qin Y., Sasinowska H. D., Leemis L. M. (2023), “The Probability Mass Function of the Kaplan–Meier Product–Limit Estimator,” The American Statistician, 77 (1), 102–110.↩︎
Kalbfleisch, J. D., and Prentice, R. L. (2002), The Statistical Analysis of Failure Time Data (2nd ed.), Hoboken, NJ: Wiley.↩︎