The matrix variate t distribution was introduced in a previous vignette along with an EM algorithm for maximum likelihood fitting of the parameters. This can be extended rather easily to the case of mixture models for model-based clustering.
As in the case of mixture modeling in general (Fraley and Raftery 2002; McLachlan, Lee, and Rathnayake 2019), the difference in the EM algorithm is that one is now including estimates of \(\pi_{j}\) for \(j\) in \(1,2, \ldots, g\), the estimated probabilities of group membership for the \(g\) groups in each step and weights \(\tau_{ij}\), weights for each observation \(i\) and group \(j\), where \[\pi_j = \frac{1}{n}\sum_{i = 1}^n \tau_{ij}\] and \[ \tau_{ij} = \frac{\pi_j f(x_i, \Theta_j)}{\sum_{l=1}^g \pi_l f(x_i, \Theta_l)}\]
The case of the matrix variate normal distribution can be seen in Viroli (2011), while the case of the multivariate t can be seen in Andrews, McNicholas, and Subedi (2011).
The updates on the parameters \(\Theta\) are weighted by \(\tau_{ij}\) in an Expectation/Conditional Maximization algorithm.
The matrixmixture()
function fits unrestricted
covariance matrices currently, but future features will implement a teigen
type of covariance restriction capability for use with the \(t\) distribution. It can set means to be
constant along rows or columns or both using the
row.mean = TRUE
and col.mean = TRUE
settings.
Currently, this can perform model fitting with unrestricted
covariance matrices and fixed degrees of freedom (nu
)
parameter or for the matrix normal distribution. It does not solve the
identifiability problem, that is, that permutations of the labels will
yield identical solutions.
matrixmixture
functionThe function takes data array x
, either an argument
K
for how many groups there are or an initialization of a
vector of probabilities prior
, an optional initialization
of centers and covariance matrices init
(if the covariances
are left blank, they will be initialized to identity matrices), and
optional arguments controlling the other parameters of function, such as
number of iterations and normal vs t. If
model = "t"
is chosen, the degrees of freedom
nu
must be provided, but in the future it can be
estimated.
library(MixMatrix)
set.seed(20180221)
A <- rmatrixt(30,mean=matrix(0,nrow=3,ncol=4), df = 10) # 3x4 matrices with mean 0
B <- rmatrixt(30,mean=matrix(1,nrow=3,ncol=4), df = 10) # 3x4 matrices with mean 2
C <- array(c(A,B), dim=c(3,4,60)) # combine into one array
prior <- c(.5,.5) # equal probability prior
# create an intialization object, starts at the true parameters
init = list(centers = array(c(rep(0,12),rep(1,12)), dim = c(3,4,2)),
U = array(c(diag(3), diag(3)), dim = c(3,3,2)),
V = array(c(diag(4), diag(4)), dim = c(4,4,2))
)
# fit model
res<-matrixmixture(C, init = init, prior = prior, nu = 10,
model = "t", tolerance = 1e-2)
print(res$centers) # the final centers
#> , , 1
#>
#> [,1] [,2] [,3] [,4]
#> [1,] -0.005608544 0.01263007 0.03606995 0.02008999
#> [2,] -0.036807465 -0.00922546 -0.02606685 -0.03205214
#> [3,] -0.068637713 -0.06653189 -0.00752340 -0.04191753
#>
#> , , 2
#>
#> [,1] [,2] [,3] [,4]
#> [1,] 0.9785204 0.9459763 0.9561746 0.9857805
#> [2,] 1.0928959 0.9807877 0.8435277 1.0341407
#> [3,] 1.1422649 1.0527232 0.9998141 1.0383794
print(res$pi) # the final mixing proportion
#> [1] 0.4999994 0.5000006
logLik(res)
#> 'log Lik.' -442.517 (df=54)
AIC(logLik(res))
#> [1] 993.0339
plot(res) # the log likelihood by iteration
The default method for determining convergence is based on Aitken acceleration of the log-likelihood. However, it can be set to stop based on changes in the log-likelihood instead.
The packages also provides a helper function
init_matrixmixture()
to provide the init
object for you. At present, it can either use the kmeans()
function on the vectorization of the input data to provide starting
centers or select random points. The ...
arguments are
passed to kmeans()
(so nstart
of other similar
arguments can be set). If a partially formed init
object is
sent to the initializer, it will complete it. However, it will not
validate that, for instance, the covariance matrices are valid. Partial
supply of initial centers is also supported - that is, if fewer centers
than groups are provided, the remainder will be chosen by whatever
method selected.
init_matrixmixture(C, prior = c(.5,.5), centermethod = 'kmeans')
#> $centers
#> , , 1
#>
#> [,1] [,2] [,3] [,4]
#> [1,] 0.11412574 -0.1191619 -0.04078157 0.127204662
#> [2,] -0.09798116 0.1763393 -0.02998913 0.001676302
#> [3,] 0.16767860 -0.1738115 -0.20268828 -0.079679047
#>
#> , , 2
#>
#> [,1] [,2] [,3] [,4]
#> [1,] 1.0480976 1.0029779 1.0264273 1.0225414
#> [2,] 0.9755583 1.0876427 1.0131553 1.2830619
#> [3,] 0.9418708 0.9333342 0.8656007 0.9025196
#>
#>
#> $U
#> , , 1
#>
#> [,1] [,2] [,3]
#> [1,] 1 0 0
#> [2,] 0 1 0
#> [3,] 0 0 1
#>
#> , , 2
#>
#> [,1] [,2] [,3]
#> [1,] 1 0 0
#> [2,] 0 1 0
#> [3,] 0 0 1
#>
#>
#> $V
#> NULL
init_matrixmixture(C, K = 2, centermethod = 'random')
#> $centers
#> , , 1
#>
#> [,1] [,2] [,3] [,4]
#> [1,] 1.678144 1.1288934 1.0701532 0.8945011
#> [2,] 1.452156 1.0876125 0.6176221 1.0579284
#> [3,] 1.271707 0.8391508 1.0755241 1.3799744
#>
#> , , 2
#>
#> [,1] [,2] [,3] [,4]
#> [1,] -0.13670235 -0.22542286 -0.1013062 0.08921306
#> [2,] 0.02554019 0.07765350 -0.1971374 -0.09639281
#> [3,] -0.47626926 -0.03061907 0.1719943 -0.17784713
#>
#>
#> $U
#> , , 1
#>
#> [,1] [,2] [,3]
#> [1,] 1 0 0
#> [2,] 0 1 0
#> [3,] 0 0 1
#>
#> , , 2
#>
#> [,1] [,2] [,3]
#> [1,] 1 0 0
#> [2,] 0 1 0
#> [3,] 0 0 1
#>
#>
#> $V
#> NULL
sessionInfo()
#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-redhat-linux-gnu
#> Running under: Fedora Linux 40 (Xfce)
#>
#> Matrix products: default
#> BLAS/LAPACK: FlexiBLAS OPENBLAS-OPENMP; LAPACK version 3.11.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.utf8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8
#> [7] LC_PAPER=en_US.utf8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
#>
#> time zone: America/Chicago
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] magrittr_2.0.3 dplyr_1.1.4 ggplot2_3.5.1 MixMatrix_0.2.8
#>
#> loaded via a namespace (and not attached):
#> [1] vctrs_0.6.5 CholWishart_1.1.2.1 cli_3.6.3
#> [4] knitr_1.48 rlang_1.1.4 xfun_0.47
#> [7] highr_0.11 generics_0.1.3 jsonlite_1.8.9
#> [10] labeling_0.4.3 glue_1.7.0 colorspace_2.1-1
#> [13] htmltools_0.5.8.1 sass_0.4.9 fansi_1.0.6
#> [16] scales_1.3.0 rmarkdown_2.28 grid_4.4.1
#> [19] evaluate_1.0.0 munsell_0.5.1 jquerylib_0.1.4
#> [22] tibble_3.2.1 fastmap_1.2.0 yaml_2.3.10
#> [25] lifecycle_1.0.4 compiler_4.4.1 pkgconfig_2.0.3
#> [28] Rcpp_1.0.13 farver_2.1.2 digest_0.6.37
#> [31] R6_2.5.1 tidyselect_1.2.1 utf8_1.2.4
#> [34] pillar_1.9.0 bslib_0.8.0 withr_3.0.1
#> [37] tools_4.4.1 gtable_0.3.5 cachem_1.1.0
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
library(MixMatrix)
set.seed(20180221)
A <- rmatrixt(30,mean=matrix(0,nrow=3,ncol=4), df = 10) # 3x4 matrices with mean 0
B <- rmatrixt(30,mean=matrix(1,nrow=3,ncol=4), df = 10) # 3x4 matrices with mean 2
C <- array(c(A,B), dim=c(3,4,60)) # combine into one array
prior <- c(.5,.5) # equal probability prior
# create an intialization object, starts at the true parameters
init = list(centers = array(c(rep(0,12),rep(1,12)), dim = c(3,4,2)),
U = array(c(diag(3), diag(3)), dim = c(3,3,2)),
V = array(c(diag(4), diag(4)), dim = c(4,4,2))
)
# fit model
res<-matrixmixture(C, init = init, prior = prior, nu = 10,
model = "t", tolerance = 1e-2)
print(res$centers) # the final centers
print(res$pi) # the final mixing proportion
logLik(res)
AIC(logLik(res))
plot(res) # the log likelihood by iteration
init_matrixmixture(C, prior = c(.5,.5), centermethod = 'kmeans')
init_matrixmixture(C, K = 2, centermethod = 'random')
sessionInfo()