Statistics and Algorithms / State Space Models /

State Space Models: Initialization

In addition to the system matrices, calculations for state space models require one other piece of information: the initial distribution of the states.

The initial distribution requires a mean and a variance. For models with just one or two states, it might be possible to come up with a reasonable setting for this, with a mean in the right range, and a variance large enough to cover any likely value. If you have pre-sample settings, use the options X0 (for the mean) and SX0 (for the covariance matrix) to input them to DLM. We used these in DLMEXAM3.RPF where we had the filtered mean and variance from the end of the sample. The default for X0 is a vector of zeros.

One thing to note is that DLM wants the pre-sample information to give \({\bf{X}}_{0|0} ,\Sigma _{0|0} \). Some books and papers use a slightly different description of the state-space model where the pre-sample information is more naturally in the form \({\bf{X}}_{1|0} ,\Sigma _{1|0} \) (in our notation). In general, these aren’t the same. If you want your input values to be interpreted in this other way, also include the option PRESAMPLE=X1.

Ergodic Solution

If \(\bf{A}\), \(\bf{Z}\), \(\bf{F}\) and \(\mathrm{var}\,{\bf{W}}\) are all time-invariant (which is usually the case), we can think about solving the model for the steady-state or ergodic distribution. If we take expected values in the state equation and assume we have a common expectation, we get

\begin{equation} E{\kern 1pt} {\bf{X}} = {\bf{A}}{\kern 1pt} E{\kern 1pt} {\bf{X}} + {\bf{Z}}{\rm{,}}\,\,\,\,{\rm{or}\,}E{\kern 1pt} {\bf{X = }}\left( {{\bf{I}} - {\bf{A}}} \right)^{ - 1} {\bf{Z}} \end{equation}

If there is no \(\bf{Z}\) component, the presample mean will just be zero, which is the default. If we take variances (assuming the variance of W is constant), we get the equation:

\begin{equation} E{\kern 1pt} {\bf{X}} = {\bf{A}}{\kern 1pt} E{\kern 1pt} {\bf{X}} + {\bf{Z}}{\rm{,}}\,\,\,{\rm{or}}\,\,E{\kern 1pt} {\bf{X = }}\left( {{\bf{I}} - {\bf{A}}} \right)^{ - 1} {\bf{Z}} \label{eq:ssm_ergodicvariance} \end{equation}

The variance equation, while taking a somewhat odd form, is, in fact, a linear equation in the elements of \(\Sigma _{\bf{X}} \). By expanding this, it can be solved by standard linear algebra techniques. The standard textbook solution for this (see, for instance, Hamilton(1994), page 378) is to rearrange it into the linear system:

\begin{equation} \left[ {{\bf{I}} - {\bf{A}} \otimes {\bf{A}}} \right]vec\left( {\Sigma _{\bf{X}} } \right) = vec\left( {\Sigma _{\bf{W}} } \right) \end{equation}

As written, this has some redundant elements, since \({\Sigma _{\bf{X}} }\) is symmetric. Still, with those eliminated, it requires solving a linear system with \(n(n+1)/2\) components. The solution procedure for this requires \(O(n^6 )\) arithmetic operations. This starts to dominate the calculation time for the entire Kalman filter process for even fairly modest values of \(n\). Instead of this “brute force” solution, RATS uses a more efficient technique described in Doan (2010), which has a solution time \(O(n^3 )\). Select this calculation for the initial mean and variance using the option PRESAMPLE=ERGODIC. If you need to do the calculation separately, you can use the function %PSDINIT(A,SW), which returns the solution to \eqref{eq:ssm_ergodicvariance} for the input values for \(\bf{A}\) and the variance for \(\bf{W}\).

Diffuse Prior

The ergodic solution described above only exists if all the eigenvalues of \(\bf{A}\) are inside the unit circle. The “solution” of \eqref{eq:ssm_ergodicvariance} when \(\bf{A}=1\) is \(\Sigma _{\bf{X}} = \infty \). The approach generally taken in state-space modelling to deal with this has been to set the pre-sample mean to zero and use a diagonal matrix with “large” elements for the covariance. One problem with this approach is that a value which is large enough to be effectively infinite for one of the states might not be large enough for another. There are also round-off error problems in the calculation of the state covariance matrix.

An alternative to this approximation was provided by Koopman (1997), which is known as exact diffuse initialization. This was later refined in Durbin and Koopman (2012). This does the actual limits as the variances go to infinity. It is implemented by writing the covariance matrices as a sum of “infinite” and “finite” parts, which are updated separately in the Kalman filter and Kalman smoother. If you use the option PRESAMPLE=ERGODIC, DLM will recognize that you have non-stationary roots and adjust the calculation to use exact diffuse initialization. This will also correctly handle the case where there are a mix of unit and stationary roots. See Doan (2010) for more information on how this works.

If you have a model where all roots are known to be unit roots, you can use the option PRESAMPLE=DIFFUSE. That will give the same result as PRESAMPLE=ERGODIC, but won’t require the extra step (internally) to analyze the roots, and so will work somewhat faster. We used this several times in the short examples earlier in the chapter.

Conditioning on Early Values

If you have a non-stationary model, the initial state vector is usually represented by a diffuse prior. If, rather than using the recommended PRESAMPLE=ERGODIC or PRESAMPLE=DIFFUSE to do exact handling of this, you use large finite values, the likelihood could be dominated by the first few observations where the variance is high. If you do this, you can use the option CONDITION=initial periods to indicate the number of observations which should not be incorporated into the likelihood function. You still include these in the estimation range—the likelihood actually used will just condition on them. Typically, you would condition on the number of states, but it might be a smaller number if the initial state vector is non-stationary only in a reduced number of dimensions.

The calculated likelihood is the only thing affected by the use of the CONDITION option. Note that this is only slightly different from what happens in the exact handling of the diffuse prior, which omits from the likelihood any observation which has an “infinite” predictive variance. Because the likelihood with the exact diffuse prior depends on the number of unit roots, CONDITION also has value where the number of unit roots isn’t known; for instance, if you have an ar component that might or might not be stationary.