Imputing categorical data by predictive mean
matching. Predictive mean matching (PMM) is the default method
of mice()
for imputing numerical variables, but it has long
been possible to impute factors. This enhancement introduces better
support to work with categorical variables in PMM. The former
system translated factors into integers by
ynum <- as.integer(f)
. However, the order of integers in
ynum
may have no sensible interpretation for an unordered
factor. The new system quantifies ynum
and
could yield better results because of higher \(R^2\). The method calculates the canonical
correlation between y
(as dummy matrix) and a linear
combination of imputation model predictors x
. The algorithm
then replaces each category of y
by a single number taken
from the first canonical variate. After this step, the imputation model
is fitted, and the predicted values from that model are extracted to
function as the similarity measure for the matching step.
The method works for both ordered and unordered factors. No
special precautions are taken to ensure monotonicity between the
category numbers and the quantifications, so the method should be able
to preserve quadratic and other non-monotone relations of the predicted
metric. It may be beneficial to remove very sparsely filled categories,
for which there is a new trim
argument. All you have to use
the new technique is specify to
mice(..., method = "pmm", ...)
. Both numerical and
categorical variables will then be imputed by PMM.
Potential advantages are:
Note that we still lack solid evidence for these claims. (#576). Contributed @stefvanbuuren
New system-independent method for pooling: This
version introduces a new function pool.table()
that takes a
tidy table of parameter estimates stemming from m
repeated
analyses. The input data must consist of three columns (parameter name,
estimate, standard error) and a specification of the degrees of freedom
of the model fitted to the complete data. The pool.table()
function outputs 14 pooled statistics in a tidy form. The primary use of
pool.table()
is to support parameter pooling for techiques
that have no tidy()
or glance()
methods,
either within R
or outside R
. The
pool.table()
function also allows for a novel workflows
that 1) break apart the traditional pool()
function into a
data-wrangling part and a parameters-reducing part, and 2) does not
necessarily depend on classed R objects. (#574). Contributed @stefvanbuuren
literanger: Adds support for the
literanger
package for rf
imputation that is
about twice as fast as ranger
(#648). Thanks @stephematician for
the contribution.
The complete(..., action = "long", ...)
command puts
the columns named ".imp"
and ".id"
in the last
two positions of the long data (instead of first two positions). In this
way, the columns of the imputed data will have the same positions as in
the original data, which is more user-friendly and easier to work with.
Note that any existing code that assumes that variables
".imp"
and ".id"
are in columns 1 and 2 will
need to be modified. The advice is to modify the code using the variable
names ".imp"
and ".id"
. If you want the old
behaviour, specify the argument order = "first"
. (#569).
Contributed @stefvanbuuren
Drops support for S4. Convert S4-related code to S3. Syntax
as(df, "mids")
is deprecated. Use as.mids(df)
instead.
dots
argument to
ranger::ranger(...)
in mice.impute.rf()
(#563). Contributed @edbonnevilleblocks
argument at
various placesblocks
in
initialize_chain()
rbind()
, when formulas are concatenated and
duplicate names are found, also rename the duplicated variables in
formulas by their new nameNEWS.md
formatting to get correct version
sequence on CRAN and in-package NEWSmake.method()
in
a more efficient way (resolves #672)as.mids()
from filling the imp
object for complete variablesmids
,
mads
, mira
and mipo
objectscomplete()
that auto-repeated imputed
values into cells that should NOT be imputed (occurred as a special case
of rbind()
, where the first set of rows was imputed and the
second was not).type
by the more
informative pred
(currently active row of
predictorMatrix
)filter.mids()
that incorrectly removed
empty components in the imp
objectibind()
that incorrectly used
length(blocks)
as the first dimension of the
chainMean
and chainVar
objectsvisitSequence
,
chainMean
and chainVar
components of the
mids
objectminpuc
argument in
quickpred()
(#634)coef() not available on S4 object
when using with
lavaan
(#615, #616).github/dependabot.yml
configuration to automate
daily check (#598)roxygen2 7.3.1
requirementsRprofile
prints to
stdout
on Fedora, R version 4.1.3 (#646, #647). Thanks
@brookslogan for
the fix.methods
and rlang
from
Depends
ampute()
helpers\link
statements that do not pass CRAN
checksExpands futuremice()
functionality by allowing for
external packages and user-written functions (#550). Contributed @thomvolker
Adds GH issue templates bug_report
,
feature_request
and help_wanted
(#560).
Contributed @hanneoberman
rbind.mids()
and
cbind.mids()
to conform to CRAN policymitml
and glmnet
to imports so that
test code conforms to _R_CHECK_DEPENDS_ONLY=true
flag in
R CMD check
futuremice()
if
there is no .Random.seed
yet.predictorMatrix
for case F
by adding a predictorMatrix
argument to
make.predictorMatrix()
mice.impute.mpmm()
example codemice.impute.2lonly.pmm()
(#555)tidy()
, update()
, format()
and
sum()
R CMD check
with
_R_CHECK_DEPENDS_ONLY=true
futuremice()
that throws an error
when the number of cores is not specified, but the number of available
cores is greater than the number of imputations.mice.impute.mpmm()
that changed the
column order of the dataAdds a function futuremice()
with support for
parallel imputation using the future
package (#504).
Contributed @thomvolker, @gerkovink
Adds multivariate predictive mean matching
mice.impute.mpmm()
. (#460). Contributed @Mingyang-Cai
Adds convergence()
for convergence evaluation
(#484). Contributed @hanneoberman
Reverts the internal seed behaviour back to
mice 3.13.10
(#515). #432 introduced new local seed in
response to #426. However, various issues arose with this facility
(#459, #492, #502, #505). This version restores the old behaviour using
global .Random.seed
. Contributed @gerkovink
Adds a custom.t
argument to pool()
that
allows the advanced user to specify a custom rule for calculating the
total variance \(T\). Contributed @gerkovink
Adds new argument exclude
to
mice.impute.pmm()
that excludes a user-specified vector of
values from matching. Excluded values will not appear in the
imputations. Since the observed values are not imputed, the
user-specified values are still being used to fit the imputation model
(#392, #519). Contributed @gerkovink
.R
and .Rmd
filessampler.R
(#511)inherits()
to check on class membershipparlmice()
prop
, patterns
and
weights
matrices for pattern with only 1’sD1()
and
D2()
(#420)mice()
make.where()
test-mice.impute.rf.R
(#448).Random.seed
reads from the
.GlobalEnv
by
get(".Random.seed", envir = globalenv(), mode = "integer", inherits = FALSE)
lastSeedValue
variable namex$lastSeedValue
problem in
cbind.mids()
(#502)ampute()
mice()
by smarter random
seed initialisation (#459)drop = FALSE
buglet in
mice.impute.rf()
(#447, #448)withr
package should have
version 2.4.0 (published in January 2021) or higher. Versions
withr 2.3.0
and before may give
Error: object 'local_seed' is not exported by 'namespace:withr'
.
Either update manually, or install the patched version
mice 3.14.1
from GitHub. (#445). NOTE: withr
is no longer needed in mice 3.15.0
Adds four new univariate functions using the lasso for automatic variable selection. Contributed by @EdoardoCostantini (#438).
mice.impute.lasso.norm()
for lasso linear
regressionmice.impute.lasso.logreg()
for lasso logistic
regressionmice.impute.lasso.select.norm()
for lasso selector +
linear regressionmice.impute.lasso.select.logreg()
for lasso selector +
logistic regressionAdds Jamshidian && Jalal’s non-parametric MCAR test,
mice::MCAR()
and associated plot method. Contributed by
@cjvanlissa
(#423).
Adds two new functions pool.syn()
and
pool.scalar.syn()
that specialise pooling estimates from
synthetic data. The "reiter2003"
pooling rule assumes that
synthetic data were created from complete data. Thanks Thom Volker
(#436).
By default, mice.impute.rf()
now uses the faster
ranger
package as back-end instead of
randomForest
package. If you want the old behaviour specify
the rfPackage = "randomForest"
argument to the
mice(...)
call. Contributed @prockenschaub (#431).
.Random.seed
(#426, #432) by
implementing withr::local_preserve_seed()
and
withr::local_seed()
. This change provides stabler behavior
in complex scripts. The change does not appear to break reproducibility
when mice()
was run with a seed. Nevertheless, if you run
into a reproducibility problem, install mice 3.13.12
or
before.mice.impute.quadratic()
, adds a parameter
quad.outcome
containing the name of the outcome variable in
the complete-data model. Contributed @Mingyang-Cai, @gerkovink (#408)pool()
so that it processes the parameters
from all gamlss
sub-models. Thanks Marcio Augusto Diniz
(#406, #405)pool()
can extract robust.se
from the object
returned by broom::tidy()
(#310)pool()
cannot take a
mids
object (#433)mice.impute.2l.lmer()
to indicate a problem in fitting the
imputation model (#385)post
parameter (#326)install.on.demand()
broke the standard CRAN workflow. mice 3.14.0 does not call
install.on.demand()
anymore for recommended packages. Also,
install.on.demand()
will not run anymore in non-interactive
mode.mice:::barnard.rubin()
function
for infinite dfcom
. Thanks @huftis (#441).Xi <- as.matrix(...)
in
mice.impute.2l.lmer()
that occurred when a cluster contains
only one observation (#384)predictorMatrix
to a monotone pattern if
visitSequence = "monotone"
and maxit = 1
(#316)md.pattern()
(#318, #323)make.formulas()
(#305,
#324)newdata
in
mice.mids()
(#313, #325)where
element
created in rbind()
(#319)mids2spss()
replaces the foreign
by haven
package. Contributed Gerko Vink (#291)tests\testhat\test-D1.R
that failed
on mitml 0.4-0
with.mids()
function to old version because the
change in commit 4634094 broke downstream package metafor
(#292)mice.impute.rf()
in finding
candidate donors (#288, #289)Much faster predictive mean matching. The new
matchindex
C function makes predictive mean matching
50 to 600 times faster. The speed of pmm
is now on par with normal imputation (mice.impute.norm()
)
and with the miceFast
package, without compromising on the
statistical quality of the imputations. Thanks to Polkas https://github.com/Polkas/miceFast/issues/10 and
suggestions by Alexander Robitzsch. See #236 for more details.
New ignore
argument to
mice()
. This argument is a logical vector of
nrow(data)
elements indicating which rows are ignored when
creating the imputation model. We may use the ignore
argument to split the data into a training set (on which the imputation
model is built) and a test set (that does not influence the imputation
model estimates). The argument is based on the suggestion in https://github.com/amices/mice/issues/32#issuecomment-355600365.
See #32 for more background and techniques. Crafted by Patrick
Rockenschaub
New filter()
function for mids
objects. New filter()
method that subsets a
mids
object (multiply-imputed data set). The method accepts
a logical vector of length nrow(data)
, or an expression to
construct such a vector from the incomplete data. (#269). Crafted by
Patrick Rockenschaub.
Breaking change: The matcher
algorithm in pmm
has changed to matchindex
for
speed improvements. If you want the old behavior, specify
mice(..., use.matcher = TRUE)
.
cpp11
package
(#286)with.mids()
by calling
eval_tidy()
on a quosure. Does not yet solve #265.pool()
and
pool.scalar()
(#142, #106, #190 and others)tidy.mipo
more flexible (#276)nelsonaalen()
gets a
tibble
(#272)NA
s can appear in the imputed
data (#267)quickpred()
documentation (#268)sum.scores()
lm.mids()
,
glm.mids()
, pool.compare()
.pmm.match()
and expandcov()
return()
calls placed just before
end-of-functionprintFlag
value (#258)amices
df.residual
, which caused
problematic behavior in the D1()
, D2()
,
D3()
, anova()
and pool()
.
mice
now extracts the relevant information from other parts
of the objects returned by survival::coxph()
, which solves
long-standing issues with the integration of the Cox model (#246).Rccp
dependency to work with
tidyr 1.1.1
(#248).Non-file package-anchored link(s) in documentation object
.ampute
documentation (#251).suggests
.tidy.mipo()
and
glance.mipo()
return standardized output that conforms to
broom
specifications. Kindly contributed by Vincent Arel
Bundock (#240).D3
testing script that
produced an error on CRAN (#244).The D3()
function in mice
gave
incorrect results. This version solves a problem in the calculation of
the D3
-statistic. See #226 and #228 for more details. The
documentation explains why results from mice::D3()
and
mitml::testModels()
may differ.
The pool()
function is now more forgiving when there
is no glance()
function (#233)
It is possible to bypass remove.lindep()
by setting
eps = 0
(#225)
plot.mids()
documentationThis version adds two new NARFCS methods for imputing data under
the Missing Not at Random (MNAR) assumption. NARFCS is
generalised version of the so-called \(\delta\)-adjustment method. Margarita
Moreno-Betancur and Ian White kindly contributes the functions
mice.impute.mnar.norm()
and
mice.impute.mnar.logreg()
. These functions aid in
performing sensitivity analysis to investigate the impact of different
MNAR assumptions on the conclusion of the study. An alternative for MNAR
is the older mice.impute.ri()
function.
Installation of mice
is faster. External packages
needed for imputation and analyses are now installed on demand. The
number of dependencies as estimated by
rsconnect::appDepencies()
decreased from 132 to
83.
The name clash with the complete()
function of
tidyr
should no longer be a problem.
There is now a more flexible pool()
function that
integrates better with the broom
and
broom.mixed
packages.
pool.compare()
. Use D1()
instead (#220)utils::globalVariables()
tidyr
by defining
complete.mids()
as an S3 method for the
tidyr::complete()
generic (#212)pool()
function to deal with multiple sets
of parameters. Currently supported keywords are: term
(all
broom
functions), component
(some
broom.mixed
functions) and y.values
(for
multinom()
model) (#219)install.on.demand()
function for lighter
installationtoenail2
and remove dependency on
HSAUR3
ampute
in extreme cases (#216)pool
with mgcv::gam
(#218).gitattributes
for consistent line endingspolr()
always fail (#206)data.frame
(#208)mira-class
documentation (#207)CALIBERrfimpute
2lonly.norm
and 2lonly.pmm
a2
to elementwise division by a
matrix of observations2lonly.norm
and
2lonly.pmm
2lonly.pmm
2lonly.mean
now also works with
factorsimputationMethod
argument in
examples by method
check.predictorMatrix()
(#191)toenail
data from orphaned DPpackage
packageDPpackage
from Suggests
field in
DESCRIPTION
md.pattern()
(#170,
#177)as.mids
()
(#173)mice.impute.xxx()
so that mice::mice()
works
as expected (#55)mids2spss()
, thanks Edgar
Schoreit (#149)predictorMatrix
.mice 3.3.1
will impute those variables using the intercept
onlynelsonaalen()
function for data where
variables time
or status
have already been
defined (#140), thanks matthieu-faronmice 3.0.0
-
mice 3.2.0
under passive imputation.broom 0.5.0
(#128)mice.impute.2l.norm()
(#129)mice.impute.2l.norm()
(#129)D1()
(#128)md.pattern
(#126)rbind
and cbind
(#114)rbind
problem when method
is a list
(#113)parlmice
(#109)dfcom
argument to pool()
(#105,
#110)parlmice
+ bugfix (#107)parlmice
(#104)flux
(#102)estimice
(#101)parent.frame
(#98)NEWS.md
, index.Rmd
and online package
documentation.R
instead of .r
updateLog
(#8, @alexanderrobitzsch)md.pattern
(#90)m
(#89)Version 3.0 represents a major update that implements the following features:
blocks
: The main algorithm iterates over blocks. A
block is simply a collection of variables. In the common MICE algorithm
each block was equivalent to one variable, which - of course - is the
default; The blocks
argument allows mixing univariate
imputation method multivariate imputation methods. The
blocks
feature bridges two seemingly disparate approaches,
joint modeling and fully conditional specification, into one
framework;
where
: The where
argument is a logical
matrix of the same size of data
that specifies which cells
should be imputed. This opens up some new analytic
possibilities;
Multivariate tests: There are new functions D1()
,
D2()
, D3()
and anova()
that
perform multivariate parameter tests on the repeated analysis from on
multiply-imputed data;
formulas
: The old form
argument has
been redesign and is now renamed to formulas
. This provides
an alternative way to specify imputation models that exploits the full
power of R’s native formula’s.
Better integration with the tidyverse
framework,
especially for packages dplyr
, tibble
and
broom
;
Improved numerical algorithms for low-level imputation function. Better handling of duplicate variables.
Last but not least: A brand new edition AND online version of Flexible Imputation of Missing Data. Second Edition.
mids
object in mice
(thanks stephematician) (#61)rbind.mids
(thanks stephematician)
(#59)pool.compare()
in handling factors
(#60)rbind.mids
in handling where
(#59)as.mids()
, add
as()
cart
not accepting a matrix (thanks
Joerg Drechsler)pool()
to list of modelsampute
function and vignettes (Rianne
Schouten)mice.impute.2l.sys
to
mice.impute.2l.lmer
where
argument to micewy
argument to imputation functionsmice.impute.2l.sys()
, author Shahab Jolanicbind()
functionmids
objectlattice
packagexyplot.mads
mice.impute.2lonly.pmm()
ampute()
by Rianne Schoutenmice
function (thanks Ben
Ogorek)cbind.mids()
replaced by calls to
cbind()
miceVignettes
on github (thanks Gerko
Vink)README
for GitHubccn
–> ncc
, icn
–> nic
cc()
, ncc()
,
cci()
, ic()
, nic()
and
ici()
use S3
dispatchmultinom
MaxNWts type fix in polyreg
and polr
#9pool.compare
#12as.mids
if names not same as all columns #11glmer
models #5midastouch
: predictive mean matching for small
samples (thanks Philip Gaffert, Florian Meinfelder)rpart
callridge
to 2l.norm()
.o
filesas.mids()
bug that crashed
miceadds::mice.1chain()
impute.polyreg()
bug that bombed if there were no
predictors (thanks Jan Graffelman)as.mids()
bug that gave incorrect \(m\) (several users)pool.compare()
error for lmer
object
(thanks Claudio Bustos)mice.impute.2l.norm()
if just one
NA
(thanks Jeroen Hoogland)pool.scalar()
now can do Barnard-Rubin adjustmentpool()
now handles class lmerMod
from the
lme4
package.pmm.match()
for
safetymice.impute.pmm()
for
increased visibilitymice.impute.rf()
from 100 to 10 (thanks Anoop Shah)long2mids()
deprecated. Use as.mids()
insteadlattice
back into DEPENDS to find generic
xyplot()
and friends2lonly.pmm
(thanks Alexander Robitzsch,
Gerko Vink, Judith Godin)as.mids()
(thanks Tommy
Nyberg, Gerko Vink)mdc()
in example
mice.impute.quadratic()
mice.impute.rf()
if just one
NA
(thanks Anoop Shah)summary.mipo()
when
names(x$qbar)
equals NULL
(thanks Aiko
Kuhn)ncol()
in
mice.impute.2lonly.mean()