The ldbod package provides flexible functions for computing local density-based outlier scores. Both exact and approximate nearest neighbor search can be implemented based on an efficient k-d tree method, while also accomodating multiple neigbhorhood sizes and four different local density-based methods, LOF, LDF, RKOF, and LPDF. It allows for subsampling of input data or a user specified reference data set to compute outlier scores against, so both unsupervised and semi-supervised outlier detection can be done.
Two functions included are,ldbod
and
ldbod.ref
. Function ldbod(X,k,...)
computes
outlier scores referencing a random subsample of the input data, X.
Function ldbod.ref(X,Y,k,...)
computes outlier scores for X
based on a reference data set, Y. Y can be a set of “normal” data points
for semi-supervised outlier detection. Note: Outlier score lpdr is only
designed for unsupervised outlier detection and should not be used in
the semi-supervised setting. Both functions can return nine outlier
scores based on the methods LOF, LDF, RKOF, and LPDF. Each method
returns both densities and relative densities.
All kNN computations are carried out using the nn2
function from the RANN package. For method LPDF,
multivariate t densities are computed using the dmt
function from the mnormt package. Refer to specific
packages for more details. Note: all neighborhoods are strickly of size
k; therefore, the algorithms for LOF, LDF, and RKOF are not exact
implementations, but are similar for most situation and equivalent when
distance to k-th nearest neighbor is unique. If there are many duplicate
data points, then implementation of algorithms could lead to
dramatically different results than those that allow neighborhood sizes
larger than k, especially if k is relatively small. Removing duplicates
is recommended before computing outlier scores unless there is good
reason to keep them.
The main motivation for this package is the need for more flexible implementations of local density-based outlier detection methods, that can be used to create ensemble outlier scores. The package is based on the PhD dissteration work by K. T. Williams (2016).
To install the most up to date version in R use the following commands:
install.packages("devtools")
::install_github("kwilliams83/ldbod") devtools
or using CRAN
install.packages("ldbod")