Statistics and Algorithms / State Space Models /

Nonlinear State-Space Models

The (standard) Kalman filter relies upon three key assumptions:

1. The state equation is linear in the states

2. The measurement equation is linear in the states

3. The errors are Gaussian and independent across time

If any of these assumptions is violated, the Kalman filter will fail in some way to be an exact calculation. At a minimum (for instance, if the only failure is Gaussianity), the likelihood won't be correct though the estimates will still be minimum variance. An example of that is the Stochastic Volatility model, which is linear but with a (very) non-Gaussian error process.

Any non-linearity, either in the state equation, the measurement equation, or both, will require either a different form of filtering, such as particle filtering which is a simulation technique, or use of the Kalman filter through linearization to approximate the dynamical system. The use of the standard Kalman filter on a linearized system is known as the Extended Kalman Filter (or EKF) or Non-Linear Kalman Filter. These tend to be relatively straightforward, as they just require linearization around some expansion point.

A non-linearity in the measurement equation is particularly easy to handle through linearization. If

\begin{equation} {{\bf{y}}_t} = \psi({{\bf{X}}_t}) + {{\bf{v}}_t} \label{eq:NonlinearMeasurement} \end{equation}

then

\begin{equation} {{\bf{y}}_t} \approx \psi({{{\bf{\tilde X}}}_t}) + \psi'({{{\bf{\tilde X}}}_t})\left( {{{\bf{X}}_t} - {{{\bf{\tilde X}}}_t}} \right) + {{\bf{v}}_t} = \left( {\psi({{{\bf{\tilde X}}}_t}) - \psi'({{{\bf{\tilde X}}}_t}){{{\bf{\tilde X}}}_t}} \right) + \psi'({{{\bf{\tilde X}}}_t}){{\bf{X}}_t} + {{\bf{v}}_t} \end{equation}

where \({{{{\bf{\tilde X}}}_t}}\) is an expansion point at time \(t\) for the linearization for \(\bf{X}_t\). This produces a linear measurement equation where \(\left( {\psi({{{\bf{\tilde X}}}_t}) - \psi'({{{\bf{\tilde X}}}_t}){{{\bf{\tilde X}}}_t}} \right)\) is a constant (given the expansion point) and will be the \({\mu _t}\) in the measurement and \(\psi'({{{\bf{\tilde X}}}_t})\) is \(\bf{C}'\). In practice, much of \(\psi\) is often linear, which reduces the need for linearization to just a term or two.

One can also linearize a non-linear state equation (which can also be combined with a non-linear measurement equation in a single model in the obvious way). If

\begin{equation} {{\bf{X}}_t} = \phi({{\bf{X}}_{t - 1}}) + {\bf{F}}{{\bf{W}}_t} \label{eq:NonlinearTransition} \end{equation}

the linearization is

\begin{align*} {{\bf{X}}_t} &\approx \phi({{{\bf{\tilde X}}}_{t - 1}}) + \phi'({{{\bf{\tilde X}}}_{t - 1}})\left( {{{\bf{X}}_{t - 1}} - {{{\bf{\tilde X}}}_{t - 1}}} \right) + {\bf{F}}{{\bf{W}}_t} \\ &= \left( {\phi({{{\bf{\tilde X}}}_{t - 1}}) - \phi'({{{\bf{\tilde X}}}_{t - 1}}){{{\bf{\tilde X}}}_{t - 1}}} \right) + \phi'({{{\bf{\tilde X}}}_{t - 1}}){{\bf{X}}_{t - 1}} + {\bf{F}}{{\bf{W}}_t} \end{align*}

\(\left( {\phi({{{\bf{\tilde X}}}_{t - 1}}) - \phi'({{{\bf{\tilde X}}}_{t - 1}}){{{\bf{\tilde X}}}_{t - 1}}} \right)\) is the \(\bf{Z}_t\) in the linearized state equation and \(\phi'({{{\bf{\tilde X}}}_{t - 1}})\) is the \(\bf{A}_t\). The linearized measurement equation generally works fairly well—the linearized state equation takes us a bit farther away from where we might expect that the approximate Kalman filter will be successful. Much of this is due to how it affects the start of the recursion. First, there is no "ergodic" solution for the linearized state, since the matrices in the state equation are time-varying. Also, it depends upon the expansion in the lag of \(\bf{X}\), which doesn't exist pre-sample. In addition, the approximation error tends to accumulate quicker when it applies directly to the period-to-period evolution of the states.

One important technical question is how to choose \({{{{\bf{\tilde X}}}_t}}\). In the engineering literature, it is very common to choose \({{\bf{\tilde X}}_t} = {{\bf{X}}_{t|t - 1}}\) that is, the "best guess" that would be available prior to observing \(Y_t\), and if you have a "real-time" system that you're employing, that would be the obvious choice. If you are interested, however, in smoothed estimates of the states rather than just the current filtered ones, that's not necessarily the best way to handle it, since the predicted estimates at the start of the sample can be quite poor. An alternative if you want good estimates across the full sample is to use the smoothed estimates (\({{\bf{\tilde X}}_t} = {{\bf{X}}_{t|T}}\)), which is what is done in the @DISAGGREGATE procedure to handle the log-linear model. The difficulty there is that the smoothed estimates change every time the expansion points change, even if the model itself is treated as known, which requires either iterating to convergence or doing a single smooth at reasonable guess values and sticking with it.

One example of the non-linear Kalman filter in practice is Matheson and Stavrev(2023), which does a relatively standard NAIRU-Phillips Curve model but allows for time-varying coefficients in the measurement equations—the multiplicative interaction between those coefficients and the unobservable states create the non-linearity.