Statistics and Algorithms / Vector Autoregressions /

VAR: Kalman Filter

The Kalman filter is a fast, recursive algorithm for estimating and evaluating dynamic linear models. RATS has three instructions or sets of instructions which use it. DLM (Dynamic Linear Models) is used for state-space models. These are usually small dynamic models which relate observed data to unobservable “states.” RLS uses the Kalman filter for Recursive Least Squares for a single linear model. This is generally used for examining the stability of a linear relationship.

For VAR’s, the “states” in the Kalman filter are the coefficients. The instruction KALMAN produces sequential estimates of these. KALMAN can be used to update coefficient estimates as the data set expands, or to estimate a model where (true) coefficients are allowed to vary with time.

KALMAN works just one observation at a time, so it will generally be used in a loop. Because of the “step-at-a-time” design, you can do sequential calculations based upon each set of coefficients, such as producing forecasts or adjusting the model.

Assumptions

As mentioned above, the Kalman filter, as used by RATS for VAR's, is a restricted form of the more general Kalman filter. It is specifically designed for estimating coefficients of a linear regression model, allowing for limited types of time-variation.

The following is the model we use. $\bf{\beta}_t$ is the vector of coefficients at time $t$:

•The measurement equation is $y_t = {\bf{X}}_t {\bf{\beta }}_t + u_t $, where the variance of $u_t$ is $n_t$.

•The state vector follows the process ${\bf{\beta }}_t = {\bf{\beta }}_{t - 1} + {\bf{v}}_t $, with ${\mathop{\rm var}} ({\bf{v}}_t ) = {\bf{M}}_t $.

•$u_t$ and $\bf{v}_t$ are independent.

•$n_t$ and ${\bf{M}}_t $ are assumed to be known.

If ${\bf{M}}_t $ is equal to zero, then there is no time-variation.

Updating Formula

If we have an estimate of ${\bf{\beta }}_{t - 1} $ using information through $t-1$ (denoted ${\bf{\beta }}_{t{\rm{ - 1|}}t{\rm{ - 1}}} $) and its covariance matrix ${\bf{\Sigma }}_{t - 1} $, then the updated estimate given $y_t$ and $\bf{X}_t$ is

\begin{equation} {\bf{\Sigma }}_{t|t - 1} = {\bf{\Sigma }}_{t - 1} + {\bf{M}}_t \label{var_kalmanvarianceprediction} \end{equation}

\begin{equation} {\bf{\Sigma }}_t = {\bf{\Sigma }}_{t|t - 1} - {\bf{\Sigma }}_{t|t - 1} {\bf{X'}}_t ({\bf{X}}_t {\bf{\Sigma }}_{t|t - 1} {\bf{X'}}_t + n_t )^{ - 1} {\bf{X}}_t {\bf{\Sigma }}_{t|t - 1} \label{eq:var_kalmansigmaupdategeneral} \end{equation}

\begin{equation} {\bf{\beta }}_{t|t} = {\bf{\beta }}_{t - 1|t - 1} + {\bf{\Sigma }}_{t|t - 1} {\bf{X'}}_t ({\bf{X}}_t {\bf{\Sigma }}_{t|t - 1} {\bf{X'}}_t + n_t )^{ - 1} (y_t - {\bf{X}}_t {\bf{\beta }}_{t - 1|t - 1} ) \label{eq:var_kalmanbetaupdategeneral} \end{equation}

Initial Values

To use this algorithm, we need to supply the following information:

•${\bf{\beta }}_{{\rm{0|0}}} $, the initial state vector

•$\bf{\Sigma}_0$, the initial covariance matrix of the states

•$n_t$, the variance of the measurement equation

•$\bf{M}_t$, the variance of the change in the state vector

Sequential Estimation

In the most common use of the Kalman filter in RATS:

•$\bf{M}_t$ is 0

•$n_t$ is assumed to be the constant $\sigma^2$

•$\bf{\Sigma}_0$ and ${\bf{\beta }}_{{\rm{0|0}}} $ are obtained using ESTIMATE through part of the sample.

This is a “fixed coefficients” setup. The true coefficient vector is assumed to stay the same throughout the sample, and the Kalman filter is used to estimate it using samples of increasing size.

If we look at the formulas above, note that if we multiply $\bf{M}_t$, $\bf{\Sigma}_t$ and $n_t$ by the same constant, then that constant will drop out of the updating formula for $\bf{\beta}$. We can take advantage of this by setting $n_t$ to 1. We will then have a $\bf{\Sigma}$ that must be multiplied by $\sigma^2$ (which can be estimated separately using the residuals) to get the true covariance matrix. This makes $\bf{\Sigma}$ analogous to $\left( {{\bf{X'}}{\kern 1pt} {\bf{X}}} \right)^{ - 1} $ in a least-squares regression. ESTIMATE will initialize the Kalman filter and each KALMAN instruction will add one more observation to the sample.

This segment of code (from CANMODEL.RPF) estimates the model through 1998:4, and does updates over the period from 1999:1 to 2006:3. Forecast performance statistics are compiled over that period using the instruction THEIL. Note that the estimates for a VAR without a prior will be the same whether you estimate through the whole period, or first estimate through a subperiod with the Kalman filter being used for the remainder. However, they won’t match when the system has a prior. This is because the prior is rescaled using statistics computed using the sample on the ESTIMATE.

system(model=canmodel)

variables logcangdp logcandefl logcanm1 logexrate $

can3mthpcp logusagdp

lags 1 to 4

det constant

specify(tightness=.1) .5

end(system)

theil(model=canmodel,setup,steps=12,to=2006:4)

estimate(noprint) * 1998:4

theil

do time=1999:1,2006:3

kalman

theil

end do time

theil(dump,window="Forecast Statistics")

Recursive Residuals

The KALMAN instruction can also be used to compute recursive residuals, either for a VAR, or for a general linear regression. However, you will probably find the RLS instruction to be much more convenient for single equation models.

To compute recursive residuals with KALMAN, you first estimate the model over the first $K$ observations ($K$ = number of regressors), and then use the Kalman filter to run through the remainder of the sample. The $T-K$ recursive residuals have the convenient property that, if the model is the Standard Normal Linear Model, these residuals are independent Normal. They are the series of normalized one-step Kalman Filter forecast errors:

\begin{equation} \frac{{\left( {y_t - {\bf{X}}_t \bf{\beta }_{t - 1|t - 1} } \right)}}{{\sqrt {n_t + {\bf{X}}_t (\bf{\Sigma }_{t - 1} + {\bf{M}}_t ){\bf{X}}_t ^\prime } }} \label{eq:var_kalmanrresid} \end{equation}

The basic setup for a VAR model would be as follows:

system(model=recresvar)

variables list of variables

lags list of lags

deterministic list of deterministic variables

end(system)

compute nreg = number of regressors per equation

estimate(noprint) start start+nreg-1

do time=start+nreg,end

kalman(rtype=recursive,resids=recresids)

end do time

Time-Varying Coefficients

If you permit time-variation, you can’t initialize the Kalman filter using ESTIMATE. Instead, you need to start by setting presample values for $\bf{\beta}$ and $\bf{\Sigma}$, and then filter through the entire sample.

For very small models, you can set these as free parameters and optimize using FIND. However, this very quickly becomes infeasible. To run the simplest $K$ variable time-varying parameters estimation, you need to set (or estimate) $K$ (initial $\bf{\beta}$) + $K(K+1)/2$ (initial $\bf{\Sigma}$) + $K(K+1)/2$ ($\bf{M}$ matrix) + 1 (measurement equation variance) total parameters. With a mere 10 regressors, this is well over 100.

Fortunately, the initial values for $\bf{\beta}$ and $\bf{\Sigma}$ serve the same purpose as the mean and variance of a Bayesian prior. This suggests that, at least for VARs, we can just start with a standard prior. Taking for the $\bf{M}_t$ matrix a scale multiple of the prior covariance matrix $\bf{\Sigma}_0$ also makes sense—the logic for the shape of the prior would apply equally well to the shape of the covariance matrix of changes so only the scale would need to be chosen.

The KFSET (Kalman Filter SET) and TVARYING instructions added to the SYSTEM definition are used to set up the matrices used for time-varying coefficients models. Most of the information is actually provided using KFSET which sets up (but does not initialize) the matrices to be used for the prior coefficient covariance matrices (one for each equation) and the $n_t$ series (again, one for each equation).

For a standard VAR with a symmetric prior, the procedure @TVARSET does all the required setup work. In TVARSET.RPF, which is a time-varying version of the "Simple BVAR" model in the CANMODEL.RPF example, this model is created with

@tvarset(model=canmodel,lags=4,tight=.1,other=.5)

# logcangdp logcandefl logcanm1 logexrate can3mthpcp logusagdp

This creates a symmetric prior for "other" weight of .5 and an overall tightness of .1 (with the standard random walk prior). The two added pieces of information for time-variation are

1.the values of $n_t$ for each equation, which are constant across time, and given their default values of .9 times the residual variance for the fixed coefficient model. The .9 multiplier is because we expect that the equation errors in a fixed model will be somewhat higher than what we would have with a time varying model.

2.the values of $\bf{M}_t$ for each equation, which again are constant across time and given a default value of $10^{-8}$ times the prior covariance. While this seems to be (and is) quite small, experience has shown that substantially larger values work poorly, mainly because the number of coefficients in the typical VAR makes it too easy for the model to "explain" individual data points by shifting coefficients if the time-variation isn't sufficiently stiff.

To estimate a model, you have to Kalman filter from the start. The initial estimates will be dominated by the prior, and so won't be particularly useful. KALMAN with the START option is used at the first period, which takes the prior and produces filtered results based upon just that one observation. After that, you just do a simple KALMAN instruction to roll the estimates forward a period. In TVARSET.RPF, the initial sample estimation is done with

compute estart=(1981:1)+4

do time=estart,1998:4

if time==estart

kalman(start=time)

else

kalman

end do time

Evaluating a Time-Varying Coefficients Model

Likelihood Function

For a single equation with random walk coefficient variation, conditional upon previous information, $y_t$ is distributed Normally with

Mean	${\bf{X}}_t \bf{\beta }_{t - 1\|t - 1} $
Variance	$\sigma _t^2 = n_t + {\bf{X}}_t (\bf{\Sigma }_{t - 1} + {\bf{M}}_t ){\bf{X'}}_t $

Ignoring constants, (minus) the log of the likelihood element for this observation is

\begin{equation} \log \,\sigma _t^2 + \frac{{\left( {y_t - {\bf{X}}_t \bf{\beta }_{t - 1|t - 1} } \right)^2 }}{{\sigma _t^2 }} \label{eq:var_kalmanlogl} \end{equation}

You can construct the sample likelihood by using the LIKELIHOOD option matrix on KFSET. This is a RECT with dimensions 2 $\times$ (number of equations). The first element in each column is the cumulation of the first terms for an equation and the second element is the cumulation of the second terms. Note that, while a VAR is a system of equations and thus has a multivariate likelihood, the joint likelihood is heavily dependent upon a proper modeling of the contemporaneous covariance matrix of the residuals, which is not of great interest in forecasting models. Thus, the emphasis on just the "own" likelihoods.

As mentioned in Doan, Litterman and Sims (1984, p. 10), the filtering procedure above will produce the same series of coefficient estimates if $n_t$, $\Sigma_0$ and $\bf{M}_t$ are all multiplied by a constant. Concentrating out this “nuisance parameter” gives a pseudo-likelihood function for equation n which (disregarding constants) is

-.5*(LIKELY(1,n)+nobs*LOG(LIKELY(2,n)/nobs))

where LIKELY is the LIKELIHOOD option matrix and NOBS is the number of observations.

While we could rely upon this by setting $n_t =1$, when the constant multiplier for the variances just becomes $\sigma^2$ (which can be estimated as LIKELY(2,i)/nobs), we find that it’s simpler in the case of VAR’s to pick a reasonable value for $n_t$ based upon the estimated variance from a preliminary regression (the default above is a .9 scale). This is because we set $\Sigma_0$ directly based upon the standard VAR prior. The prior on the own lags is independent of the scale of the variable involved, and the priors on the lags of other variables are scaled up to take account of the scales. The nuisance parameter in the likelihood then becomes a correction factor to the variances.

The sum of concentrated log likelihoods across equations can serve as the objective function for evaluating the time-varying coefficients model.

Forecast Statistics

An alternative to the log likelihood is to use simulated out-of-sample forecast statistics, that is, some calculation based upon the output of the THEIL instruction (such Theil's U, RMSE, MAE,...). Obviously, one difficulty is that there are effectively unlimited ways of aggregating those into a single objective function. The Theil's U has the advantage of being scale-free for each variable and having a natural expected value of 1 for a variable which actually is a random walk.

Mean	\({\bf{X}}_t \bf{\beta }_{t - 1\|t - 1} \)
Variance	\(\sigma _t^2 = n_t + {\bf{X}}_t (\bf{\Sigma }_{t - 1} + {\bf{M}}_t ){\bf{X'}}_t \)