\psi-APF for non-linear Gaussian state space models

Introduction

Vihola, Helske, and Franks (2020) suggest an efficient particle filter (sequential Monte Carlo, SMC) called $\psi$-APF for marginal likelihood estimation and state smoothing in case of state space models with linear-Gaussian state dynamics and twice-differentiable observation densities. The concept is similar to as in iterated auxiliary particle filter (Guarniero, Johansen, and Lee 2017), where the original model is “twisted” using so called $\psi$-functions, which are found iteratively. Instead of iterative procedure for finding optimal $\psi$-functions, the $\psi$-APF uses conditional smoothing distribution of the latent states based on an approximate Gaussian model (Durbin and Koopman 1997; Shephard and Pitt 1997). Compared to off-the-shelf solution based on a bootstrap filter (BSF) (Gordon, Salmond, and Smith 1993), only a fraction of particles are needed for accurate evaluation of the marginal likelihood, which in turn increases the performance of particle Markov chain Monte Carlo (MCMC) computations and the importance sampling (IS) type weighted MCMC introduced in Vihola, Helske, and Franks (2020) (see also Lindsten, Helske, and Vihola (2018) for combining deterministic approximations and SMC for probabilistic graphical models).

Here we study the $\psi$-APF in case of Gaussian models with non-linear observation or transition equations.

$\psi$-APF

Let us first consider general state space model of form \[ y_t \sim g(y_t | \alpha_t)\\ \alpha_{t+1} \sim \mu(\alpha_{t+1} | \alpha_t), \] and as in Vihola, Helske, and Franks (2020) we assume that we have an access to approximating densities $\tilde g(y_t | \alpha_t)$, and $\tilde \mu(\alpha_{t+1} | \alpha_t)$.

We define \[ \psi_t(\alpha_t) = \tilde g(y_t | \alpha_t) \textrm{E} \left[ \left. \prod_{p = t + 1}^T \tilde g(y_p | \alpha_p) \right\vert \alpha_t \right] = \tilde g(y_t | \alpha_t) \int \tilde p(y_{t+1:T}, \alpha_{t+1:T} | \alpha_t) \textrm{d} \alpha_{t+1:T} = \tilde p(y_{t:T} | x_t), \]

where $\tilde g(\cdot)$ and $\tilde p(\cdot)$ correspond to the approximating model.

This leads to twisted transition and observation densities of form

\[ \begin{aligned} \mu_1^{\Psi}(\alpha_1) &= \tilde p(\alpha_1 | y_{1:T}),\\ \mu_t^{\Psi}(\alpha_t | \alpha_{t-1}) &= \tilde p(\alpha_t | \alpha_{t-1}, y_{t:T}), &t=2\ldots,T,\\ g_1^{\Psi}(y_1 | \alpha_1) &= \frac{\mu_1(\alpha_1)}{\tilde \mu_1(\alpha_1)} \frac{g(y_1 | \alpha_1)}{\tilde g(y_1 | \alpha_1)} \tilde p(y_{1:T}),\\ g_t^{\Psi}(y_t | \alpha_t) &= \frac{\mu_t(\alpha_t | \alpha_{t-1})}{\tilde \mu_t(\alpha_t | \alpha_{t-1})}\frac{g(y_t | \alpha_t)}{\tilde g(y_t | \alpha_t)}, &t=2\ldots,T, \end{aligned} \]

Running particle filter with potentials $g^{\psi}_t$ and proposals $\mu^{\psi}_t$ does not produce the correct filtering distribution (with respect to the original model), but the resulting smoothing distribution and marginal likelihood estimate $p(y_{1:T})$ coincide with the corresponding estimates of the original model.

In a case where the transition densities $\mu_t$ are Gaussian and the observation densities belong to exponential family, we can obtain the twisted densities via Gaussian approximation as in Vihola, Helske, and Franks (2020). These twisted transition densities correspond to the marginal conditional smoothing distribution $\tilde p(\alpha_t | \alpha_{t-1}, y_{t:T})$ which can be computed straightforwardly from the output of Kalman filter and smoother, as the conditional distribution is Gaussian with mean

\[ \hat \alpha_t + \textrm{Cov}(\alpha_{t},\alpha_{t-1} | y_{1:T}) \textrm{Var}(\alpha_{t-1}| y_{1:T})^{-1} (\alpha_{t-1} - \hat \alpha_{t-1}) \] and variance \[ \textrm{Var}(\alpha_t | y_{1:T}) - \textrm{Cov}(\alpha_t,\alpha_{t-1} | y_{1:T}) \textrm{Var}(\alpha_{t-1} | y_{1:T})^{-1}\textrm{Cov}(\alpha_{t-1},\alpha_t | y_{1:T}). \] The mean and variance terms are a standard output of smoothing algorithm, whereas the covariance term can be computed at the same time from the auxiliary variables used in smoothing (see, e.g., Durbin and Koopman (2012)). Note that in this case the potentials $g_t^{\Psi}(y_t | \alpha_t)$ simplify to form $\frac{g(y_t | \alpha_t)}{\tilde g(y_t | \alpha_t)}$ and similaly for the first time point which contains additional term corresponding to the marginal likelihood of the approximating Gaussian model.

$\psi$-APF for non-linear Gaussian models

We now focus on a non-linear case where the model is of form \[ p(y_t | \alpha_t) = Z_t(\alpha_t) + H_t\epsilon_t,\\ p(\alpha_{t+1} | \alpha_t) = T_t(\alpha_t) + R_t \eta_t. \] Compared to Vihola, Helske, and Franks (2020), the transition density $p(\alpha_{t+1} | \alpha_t)$ is now assumed to be non-linear function of states, and we assume (possibly non-linear) Gaussian density for the observations, and the Gaussian approximation approach of Durbin and Koopman (1997) and Shephard and Pitt (1997) is not applicable as such. Natural modification to the algorithm is to use extended Kalman filter (EKF) for obtaining linear-Gaussian model.

In importance sampling framework, the usage of first order Taylor expansion for obtaining approximate Gaussian model is briefly discussed in Durbin and Koopman (2001). Here we first we run the EKF and the corresponding extended Kalman smoothing algorithm using the the original model. Then, using the obtained smoothed estimates $\tilde \alpha$ as a new linearization point, we construct the corresponding mean-adjusted linear-Gaussian model:

\[ y_t = d_t + \dot{Z_t} \alpha_t + H_t \epsilon_t,\\ \alpha_{t+1} = c_t + \dot{T_t} \alpha_t + R_t \eta_t,\\ \] where \[ \dot{Z_t} = \left. \frac{\partial Z_t(x)}{\partial x}\right|_{x=\tilde\alpha_t},\\ d_t = Z_t(\tilde \alpha_t) - \dot{Z_t} \tilde \alpha_t,\\ \dot{T_t} = \left. \frac{\partial T_t(x)}{\partial x}\right|_{x=\tilde\alpha_t},\\ c_t = T_t(\tilde \alpha_t) - \dot{T_t} \tilde \alpha_t. \]

We then run Kalman filter and smoother (KFS) again for this model, linearize, and continue until convergence. Durbin and Koopman (2001) show that the smoothed state estimate $\hat \alpha$ of the final approximating Gaussian model coincide with the conditional mode of the original model.

Compared to the $\psi$-APF with linear-Gaussian states and observations from exponential family, there are cases of non-linear models where it is likely that $\psi$-APF does not perform well. Due to the severe non-linearities of the model, it is possible that the linearization algorithm does not converge. In case the EKF at first step tends to diverge, using so called iterated extended Kalman filter (Jazwinski 1970) can sometimes be useful. Another issue is multimodal distributions, where it is likely that standard BSF outperforms $\psi$-APF. Nevertheless, as illustrated in the next Section, when applicable $-APF can lead to significant computational gains compared to several other particle filtering algorithms.

Illustrations

We now compare $\psi$-PF with standard bootstrap filter (BSF) (Gordon, Salmond, and Smith 1993), and extended Kalman particle filter algorithm (Van Der Merwe et al. 2000). Note that Van Der Merwe et al. (2000) recommend using particle filter based on unscented Kalman filter (UKF) instead of EKPF as it generally produces more accurate results, but as the UKF algorithm depends of several tuning parameters, its use as black-box-type algorithm is more problematic.

We are interested in the accuracy of the log-likelihood estimate and the relative computational performance of the different particle filtering algorithms with varying number of particles $N$. In addition, we compare the accuracy of the smoothed state estimates at the first and last time point, based on the filter-smoother algorithm (Kitagawa 1996). As a efficiency measure, we use inverse relative efficiency (IRE), defined as the mean squared error (MSE) multiplied by the average computation time, where the reference value for MSE is based on a bootstrap filter with 100,000 particles.

Non-linear transition equation

Our first model is logistic growth model of form \[ y_t = p_t + \epsilon_t,\\ p_{t+1} = K p_t \frac{\exp(r_t dt)}{K + p_t (\exp(r_tdt ) - 1)} + \xi_t,\\ r_t = \frac{\exp{r'_t}}{1 + \exp{r'_t}},\\ r'_{t+1} = r'_t + \eta_t, \] with constant carrying capacity $K = 500$, initial population size $p_1 = 50$, initial growth rate on logit scale $r'_1 = -1.5$, $dt = 0.1$, $\xi \sim N(0,1)$, $\eta \sim N(0,0.05^2)$, and $\epsilon \sim N(0, 1)$.

First, we simulated one realization of length 300 from the logistic growth model. All the model parameters were assumed to be known, expect that the prior variances for the states $p_1$ and $r'_1$ was set to $1$ and $100$. We then performed particle smoothing 1000 times with $N=100$, and $N=1000$ particles, using BSF, EKPF, and $\psi$-APF algorithms.

Table 1 shows the means, standard deviations, and IREs of the log-likelihood estimates using different methods. We see although BSF produces less accurate results than EKPF, the computational load of EKFP negates the increased accuracy in terms of IRE. However, $\psi$-APF provides significantly smaller standard errors with IREs several orders of magnitude smaller than ones obtained from other methods.

Results for the log-likelihood estimates of the growth model.
method	N	mean	SD	IRE	time
BSF	10	-84893.8620	241703.9111	2.450686e+08	0.0037
EKPF	10	-629.5561	29.4238	2.311380e+01	0.0164
PSI	10	-606.2218	0.1201	3.000000e-04	0.0185
BSF	100	-611.0850	3.7534	1.147900e+00	0.0300
EKPF	100	-607.7500	1.8476	8.894000e-01	0.1499
PSI	100	-606.2149	0.0425	5.000000e-04	0.1044
BSF	1000	-606.6395	0.9176	3.087000e-01	0.2888
EKPF	1000	-606.3727	0.5524	5.204000e-01	1.4905
PSI	1000	-606.2138	0.0157	2.600000e-03	0.9049

Similar table for the smoothed estimate of $p_1$ show again the superiority of the $\psi$-PF, with no clear differences between BSF and EKPF.

Results for the p_1 estimates of the growth model.
method	N	mean	SD	IRE	time
BSF	10	-1.5025	0.8657	0.0031	0.0037
EKPF	10	-1.2808	0.3947	0.0026	0.0164
PSI	10	-1.2410	0.1371	0.0003	0.0185
BSF	100	-1.2495	0.2691	0.0022	0.0300
EKPF	100	-1.2474	0.2231	0.0075	0.1499
PSI	100	-1.2437	0.0609	0.0004	0.1044
BSF	1000	-1.2459	0.1691	0.0083	0.2888
EKPF	1000	-1.2462	0.1176	0.0209	1.4905
PSI	1000	-1.2449	0.0250	0.0007	0.9049

Non-linear observation equation

As a second illustration, we consider a model \[ y_t = \exp(\alpha_t) + \epsilon_t,\\ \alpha_{t+1} = 0.95\alpha_t + \eta_t, \] where $\eta \sim N(0, \sigma^2)$ and $\epsilon_t \sim N(0, 1)$, with stationary initial distribution $\alpha_1 \sim N(0, \sigma^2 / (1-0.95^2))$. Now state dynamics are linear-Gaussian, but the observation density is nonlinear with respect to state.

We simulated data of length $n=100$ using $\sigma^2 = 0.1$ and $\sigma^2=1$. In case of $\sigma^2=1$, EKPF generated spurious results and therefore the results are only shown for BSF and $\psi$-PF.

Table 4 shows the means and standard deviations of the log-likelihood estimates over the replications as well as IRE and average runtime using different methods. Again, $\psi$-APF performs well.

Results for the log-likelihood estimates of the AR model.
method	N	mean	SD	IRE	time
BSF	10	-144.2271	1.2130	0.0025	0.0013
EKPF	10	-144.1348	1.1224	0.0080	0.0052
PSI	10	-143.6263	0.2560	0.0005	0.0076
BSF	100	-143.6519	0.3493	0.0010	0.0083
EKPF	100	-143.6475	0.3316	0.0051	0.0452
PSI	100	-143.6000	0.0932	0.0004	0.0448
BSF	1000	-143.6016	0.1091	0.0009	0.0787
EKPF	1000	-143.6014	0.1045	0.0046	0.4220
PSI	1000	-143.5959	0.0320	0.0004	0.4071

Although with fixed number of particles the $\psi$-APF produces smaller standard errors than BSF and PEKF (which behave very similarly) for $\alpha_1$, their IREs are comparable.

Results for the alpha_1 estimates of the AR model.
method	N	mean	SD	IRE
BSF	10	-0.1154	0.2465	1e-04
EKPF	10	-0.1001	0.2400	3e-04
PSI	10	-0.1368	0.1594	2e-04
BSF	100	-0.1342	0.1082	1e-04
EKPF	100	-0.1361	0.1067	5e-04
PSI	100	-0.1352	0.0699	2e-04
BSF	1000	-0.1375	0.0362	1e-04
EKPF	1000	-0.1385	0.0349	5e-04
PSI	1000	-0.1382	0.0255	3e-04

Discussion

In this note we have studied the performance of the previously suggested $\psi$-PF in context of non-linear Gaussian using the extended Kalman filter and smoother as an intermediate approximation, using two simulated case studies. Although there are obvious limitations in using the EKF (such as convergence failures due to severe non-linearities), our results suggest that if reasonably accurate EKF type approximations are available, it is beneficial to incorporate those into particle filter scheme.

References

Durbin, James, and Siem Jan Koopman. 1997. “Monte Carlo Maximum Likelihood Estimation for Non-Gaussian State Space Models.” Biometrika 84 (3): 669–84. https://doi.org/10.1093/biomet/84.3.669.

———. 2001. Time Series Analysis by State Space Methods. 1nd ed. New York: Oxford University Press.

———. 2012. Time Series Analysis by State Space Methods. 2nd ed. New York: Oxford University Press.

Gordon, Neil J., D. J. Salmond, and A. F. M. Smith. 1993. “Novel Approach to Nonlinear/Non-Gaussian Bayesian State Estimation.” IEE Proceedings-F 140 (2): 107–13.

Guarniero, Pieralberto, Adam M. Johansen, and Anthony Lee. 2017. “The Iterated Auxiliary Particle Filter.” Journal of the American Statistical Association 112 (520): 1636–47. https://doi.org/10.1080/01621459.2016.1222291.

Jazwinski, Andrew. 1970. Stochastic Processes and Filtering Theory. Academic Press.

Kitagawa, Genshiro. 1996. “Monte Carlo Filter and Smoother for Non-Gaussian Nonlinear State Space Models.” Journal of Computational and Graphical Statistics 5 (1): 1–25.

Lindsten, Fredrik, Jouni Helske, and Matti Vihola. 2018. “Graphical Model Inference: Sequential Monte Carlo Meets Deterministic Approximations.” In Advances in Neural Information Processing Systems 31, edited by S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, 8201–11. Curran Associates, Inc. https://papers.nips.cc/paper/8041-graphical-model-inference-sequential-monte-carlo-meets-deterministic-approximations.pdf.

Shephard, Neil, and Michael K. Pitt. 1997. “Likelihood Analysis of Non-Gaussian Measurement Time Series.” Biometrika 84 (3): 653–67.

Van Der Merwe, Rudolph, Arnaud Doucet, Nando De Freitas, and Eric Wan. 2000. “The Unscented Particle Filter.” In Proceedings of the 13th International Conference on Neural Information Processing Systems, 563–69. NIPS’00. Cambridge, MA, USA: MIT Press. https://dl.acm.org/doi/10.5555/3008751.3008833.

Vihola, Matti, Jouni Helske, and Jordan Franks. 2020. “Importance Sampling Type Estimators Based on Approximate Marginal MCMC.” Scandinavian Journal of Statistics. https://doi.org/10.1111/sjos.12492.

\(\psi\)-APF for non-linear Gaussian state space models

Jouni Helske

26 October 2020