State Space Models |
Dynamic linear or State-space models (DLM's) form a very broad class which can describe most of the models commonly used in time series work, either directly in their linear form, or indirectly, as an approximation to non-linear dynamics. The instruction DLM is the primary tool for dealing with these models in RATS.
The general form of a state-space model has quite a few components: unobservable states, observable data, shocks, mapping matrices. The DLM instruction has as many as twelve inputs, and almost as many potential outputs.
We will write the model in the following form:
\begin{equation} {\bf{X}}_t = {\bf{A}}_t {\bf{X}}_{t - 1} + {\bf{Z}}_t + {\bf{F}}_t {\bf{W}}_t \label{eq:ssm_standardstate} \end{equation}
\begin{equation} {\bf{Y}}_t = \mu _t + {\bf{C}}_t ^\prime {\bf{X}}_t + {\bf{V}}_t \label{eq:ssm_standardmeasurement} \end{equation}
The \(\bf{X}\)’s are the (unobservable) state variables; the \(\bf{Y}\) are the observable data. \eqref{eq:ssm_standardstate} is known as the state equation, transition equation or system equation. \eqref{eq:ssm_standardmeasurement} is the measurement equation or observation equation. The \(\bf{Z}\), if present, are exogenous variables in the evolution of the states. The \(\bf{W}\) are shocks to the states; the \(\bf{F}\) matrix has the loadings from those shocks to states, which allows for there to be fewer shocks than states (which will generally be the case). The \(\mu\) are any components in the description of the observable which depend upon exogenous variables (such as a mean in the form of a linear regression). The \(\bf{V}\) are measurement errors. The \(\bf{W}\) and \(\bf{V}\) are assumed to be mean zero, normally distributed, independent across time and independent of each other at time \(t\) as well.
The model \eqref{eq:ssm_standardstate} and \eqref{eq:ssm_standardmeasurement} is too broad for many purposes, and too narrow for others. It is broader than often needed since many of the components will be either absent or at least won’t be time-varying. For instance, in \eqref{eq:ssm_standardstate}, the transition matrices \(\bf{A}_t\) are usually time-invariant, as are the variances of shocks \(\bf{W}_t\). And most models don’t need the exogenous shifts \(\bf{Z}_t\) or \(\mu_t\).
It is too narrow since it allows for only one lag of the state, and both equations are linear. The first restriction is easily overcome; allowing for non-linearity is harder, but doable, at least approximately. Contemporaneous correlation between the shocks in the state equation \(\bf{W}_t\) and those in the measurement equation \(\bf{V}_t\) can be handled by adding \(\bf{V}_t\) to the set of states; serial correlation in \(\bf{V}_t\) is handled the same way.
What can state-space models do for us? In some cases, our main interest is in the unobservable states. In the typical application, the state-space model generates a decomposition of the observable \(\bf{Y}_t\) into two or more unobservable components. The state-space model (known as a UC model) allows estimates of these. In other cases, the states are only a means to an end—we’re interested in doing inference on some parameters governing the process (such as variances of the shocks, or free parameters in the \(\bf{A}\) or \(\bf{C}\) matrices), and the state-space model is the most convenient way to analyze this. In either case, we need to be able to estimate the states given the data.
Let the data be represented as \(\left\{ {{\bf{Y}}_1 ,{\bf{Y}}_2 , \ldots ,{\bf{Y}}_T } \right\}\). For any random variable \(\xi\), we’ll use the following abbreviation for the conditional density on a subsample of the \(\bf{Y}_t\): \(f\left( {\xi |t} \right) \equiv f\left( {\xi |{\bf{Y}}_1 ,{\bf{Y}}_2 , \ldots ,{\bf{Y}}_t } \right)\)
.
There are three “natural” types of inference for \(\bf{X}_t\):
1.Prediction: \(f\left( {{\bf{X}}_t |t - 1} \right)\)
2.Filtering: \(f\left( {{\bf{X}}_t |t} \right)\)
3.Smoothing: \(f\left( {{\bf{X}}_t |T} \right)\)
When our main interest is in the \(\bf{X}_t\) themselves, we’ll usually want to do smoothing, putting all the information to use. Prediction and filtering are most useful when the states are only of indirect interest, since they maintain the sequencing of the data.
Note, by the way, that there is no reason that we need a value of \(\bf{Y}_t\) for every \(t = 1, \ldots ,T\). The state-space framework can handle missing data in a very simple fashion—it’s one of its greatest advantages as a computational tool.
Ignoring (for now) the issue of what to do about \(f\left( {{\bf{X}}_1 |0} \right)\), the model \eqref{eq:ssm_standardstate} and \eqref{eq:ssm_standardmeasurement}, combined with the assumption that the shocks are Gaussian, generates a joint Gaussian density for \(\left\{ {{\bf{X}}_1 , \ldots ,{\bf{X}}_T ,{\bf{Y}}_1 , \ldots ,{\bf{Y}}_T } \right\}\). The prediction, filtering and smoothing densities are thus Gaussian with a mean that’s a linear function of the \(\bf{Y}\). A direct solution of that problem, however, can be quite slow and messy, requiring an inverse of a matrix the size of \(\left\{ {{\bf{Y}}_1 ,{\bf{Y}}_2 , \ldots ,{\bf{Y}}_T } \right\}\). The Kalman filter and Kalman smoother are algorithms which solve this in a very efficient fashion, never needing an inverse larger than the dimension of just the single \(\bf{Y}_t\).
The Kalman Filter
The Kalman filter is relatively simple. Since the filtered density \(f({\bf{X}}_{t - 1} |t - 1)\) is Gaussian, we’ll write its mean as \({\bf{X}}_{t - 1|t - 1} \) and variance as \(\Sigma _{t - 1|t - 1} \). Solving \eqref{eq:ssm_standardstate} forward from \(t-1\) to \(t\) gives the predictive density for \({\bf{X}}_t |t - 1\) as Gaussian with mean and variance
\begin{equation} {\bf{X}}_{t|t - 1} = {\bf{A}}_t {\bf{X}}_{t - 1|t - 1} + {\bf{Z}}_t ,\Sigma _{t|t - 1} = {\bf{A}}_t \Sigma _{t - 1|t - 1} {\bf{A'}}_t + {\bf{F}}_t {\bf{M}}_t {\bf{F'}}_t \label{eq:ssm_kfpredictedvariance} \end{equation}
where we’re defining \({\bf{M}}_t \equiv {\mathop{\rm var}} \left( {{\bf{W}}_t } \right)\). The predictive density for \({\bf{Y}}_t |t - 1\) is then Gaussian with mean and variance
\begin{equation} \mu _t + {\bf{C'}}_t {\bf{X}}_{t|t - 1} ,{\bf{C'}}_t \Sigma _{t|t - 1} {\bf{C}}_t + {\bf{N}}_t \label{eq:ssm_kfpredicty} \end{equation}
which is using \({\bf{N}}_t \equiv {\mathop{\rm var}} \left( {{\bf{V}}_t } \right)\). These two equations give the predictive densities. Because \(\left\{ {{\bf{X}}_t ,{\bf{Y}}_t } \right\}|t - 1\) is jointly Gaussian, the filtered density for \({\bf{X}}_t |t\) can be computed using standard results for conditional normals as having mean and variance
\begin{equation} {\bf{X}}_{t|t} = {\bf{X}}_{t|t - 1} + \Sigma _{t|t - 1} {\bf{C}}_t \left( {{\bf{C'}}_t \Sigma _{t|t - 1} {\bf{C}}_t + {\bf{N}}_t } \right)^{ - 1} \left( {{\bf{Y}}_t - \mu _t - {\bf{C'X}}_{t|t - 1} } \right) \label{eq:ssm_kfstateupdate} \end{equation}
\begin{equation} \Sigma _{t|t} = \Sigma _{t|t - 1} - \Sigma _{t|t - 1} {\bf{C}}_t \left( {{\bf{C'}}_t \Sigma _{t|t - 1} {\bf{C}}_t + {\bf{N}}_t } \right)^{ - 1} {\bf{C'}}_t \Sigma _{t|t - 1} \label{eq:ssm_kfsigmaupdate} \end{equation}
\eqref{eq:ssm_kfstateupdate} and \eqref{eq:ssm_kfsigmaupdate} are sometimes known as the correction step. We correct our estimates based upon the prediction error for \(\bf{Y}_t\). The matrix \(\Sigma _{t|t - 1} {\bf{C}}_t \left( {{\bf{C'}}_t \Sigma _{t|t - 1} {\bf{C}}_t + {\bf{N}}_t } \right)^{ - 1} \) from \eqref{eq:ssm_kfstateupdate}, which shows how we adjust the mean, is known as the Kalman gain.
The Kalman Smoother
The derivation of the Kalman smoother is much more complicated than the Kalman filter. The smoothed estimates for the end-of-sample period \(T\) are just the filtered estimates; the correction for previous time periods then requires a backwards pass through the data. The algorithm used in RATS is the one given in Durbin and Koopman (2012), which requires less calculation, generally is more accurate, and provides more information as a side effect than the one in (for instance) Hamilton (1994).
Limit Calculations
In most applications, the system matrices \(\bf{A}\), \(\bf{C}\), \(\bf{F}\) and \(\bf{Z}\) and the variance matrices \(\bf{N}\) and \(\bf{M}\) are constant—the only inputs that are time-varying are \(\bf{Y}\) and (if present) \(\mu\). If that’s the case, then it can be shown that (as long as \(\bf{A}\) has no explosive roots) \(\Sigma _{t|t} \) approaches a limit. If that approaches a limit, then so does the Kalman gain. Once that limit has been reached, the calculations can be simplified considerably by just using the previous values for the variance and gain matrices.
In applications where \(\bf{Y}\) is univariate (which is probably 90% of the time), there isn’t that much to be gained from taking advantage of this since the calculations are on relatively small matrices. However, in some situations \(\bf{Y}_t\) is quite large (in a FAVAR model, it might have over 100 components), and there could be quite a savings in time to be made by exploiting this. The LIMIT option on DLM can be used to good effect in that case. It gives the number of entry points after which \(\Sigma _{t|t} \) is to be considered converged. Note that this is not an absolute time period, but the number of entries relative to the start of the filter. You may have to do some experimenting to find the best value for the limit entry, but if you need to solve a large DLM repeatedly, it can be well worth the extra effort. Note, however, that you can’t use this if you have any missing values.
Missing Values
When \(\bf{Y}\) is univariate, the handling of missing values is quite simple: if \(\bf{Y}_t\) (or \(\mu_t\)) is missing, do the prediction calculation \eqref{eq:ssm_kfpredictedvariance}, but not the corrections \eqref{eq:ssm_kfstateupdate} and \eqref{eq:ssm_kfsigmaupdate}. The “filtered” mean and variance if \(t\) isn’t observable are simply the predicted values given \(t-1\).
The same is done if \(\bf{Y}_t\) is multivariate and no part of it is observable. It’s a bit trickier if \(\bf{Y}_t\) is multivariate and some components are observable and some are not. In some papers, you will see a recommendation to make the diagonal element of \(\bf{N}_t\) very large for the missing components, which is how you would simulate the proper handling of this if you had relatively primitive Kalman filtering and smoothing software. Instead, however, RATS simply removes the components of \(\bf{C}\), \(\bf{N}\), \(\bf{Y}\) and \(\mu\) corresponding to the missing \(\bf{Y}\) and calculates \eqref{eq:ssm_kfpredicty}, \eqref{eq:ssm_kfstateupdate} and \eqref{eq:ssm_kfsigmaupdate} using those.
Setting up the System
The simplest non-trivial state-space model is the local level model, or random walk with noise. The single state variable follows the random walk:
\begin{equation} x_t = x_{t - 1} + w_t \label{eq:ssm_randomwalk} \end{equation}
and the measurement equation is
\begin{equation} y_t = x_t + v_t \end{equation}
The interpretation of this model is that \(x_t\) is an (unobservable) local level or mean for the process. It’s local in the sense that it can evolve from period to period because of the shock process \(w_t\). The observable \(y_t\) is the underlying process mean contaminated with the measurement error \(v_t\). If we compare this with \eqref{eq:ssm_standardstate} and \eqref{eq:ssm_standardmeasurement}, we can read off the following:
\begin{equation} {\bf{X}}_t = [x_t ],{\bf{A}}_t = [1],{\bf{F}}_t = [1],{\bf{W}}_t = [w_t ],{\bf{Y}}_t = [y_t ],{\bf{C}}_t = [1],{\bf{V}}_t = [v_t ] \label{eq:ssm_locallevelsystem} \end{equation}
where all other components are zero. (Note, by the way, that the definition of \(\bf{X}\) is only for interpretation and organization—you will never actually input it).
The AR(2) process
\begin{equation} {\bf{X}}_t = [x_t ],{\bf{A}}_t = [1],{\bf{F}}_t = [1],{\bf{W}}_t = [w_t ],{\bf{Y}}_t = [y_t ],{\bf{C}}_t = [1],{\bf{V}}_t = [v_t ] \label{eq:ssm_ar2example} \end{equation}
can be converted to the state equation
\begin{equation} {\bf{X}}_t \equiv \left[ {\begin{array}{*{20}c} {x_t } \\ {x_{t - 1} } \\ \end{array}} \right] = \left[ {\begin{array}{*{20}c} {\varphi _1 } & {\varphi _2 } \\ 1 & 0 \\ \end{array}} \right]{\bf{X}}_{t - 1} + \left[ {\begin{array}{*{20}c} 1 \\ 0 \\ \end{array}} \right]w_t \label{eq:ssm_ar2stateeqn} \end{equation}
Handling lags in the underlying model by adding \(x_{t-1}\) to the state vector is known as state augmentation. If \(x_t\) is the observable, note that the only shock is shifted into the state equation. The (non-zero) system matrices are
\begin{equation} {\bf{A}}_t = \left[ {\begin{array}{*{20}c} {\varphi _1 } & {\varphi _2 } \\ 1 & 0 \\ \end{array}} \right],{\bf{F}}_t = \left[ {\begin{array}{*{20}c} 1 \\ 0 \\ \end{array}} \right],{\bf{W}}_t = [w_t ],{\bf{Y}}_t = [x_t ],{\bf{C}}_t = \left[ {\begin{array}{*{20}c} 1 \\ 0 \\ \end{array}} \right],{\bf{V}}_t = [0] \label{eq:ssm_ar2system} \end{equation}
If \(y_t = \tau _t + x_t + v_t \), where \(\tau_t\) follows a random walk as in \eqref{eq:ssm_randomwalk} and \(x_t\) is the AR(2) process \eqref{eq:ssm_ar2example}, the combined state equation is
\begin{equation} {\bf{X}}_t \equiv \left[ {\begin{array}{*{20}c} {\tau _t } \\ {x_t } \\ {x_{t - 1} } \\ \end{array}} \right] = \left[ {\begin{array}{*{20}c} 1 & 0 & 0 \\ 0 & {\varphi _1 } & {\varphi _1 } \\ 0 & 1 & 0 \\ \end{array}} \right]{\bf{X}}_{t - 1} + \left[ {\begin{array}{*{20}c} 1 & 0 \\ 0 & 1 \\ 0 & 0 \\ \end{array}} \right]\left[ {\begin{array}{*{20}c} {w_t^{\left( \tau \right)} } \\ {w_t^{\left( x \right)} } \\ \end{array}} \right] \end{equation}
so we get
\begin{equation} {\bf{A}}_t = \left[ {\begin{array}{*{20}c} 1 & 0 & 0 \\ 0 & {\varphi _1 } & {\varphi _1 } \\ 0 & 1 & 0 \\ \end{array}} \right],{\bf{F}}_t = \left[ {\begin{array}{*{20}c} 1 & 0 \\ 0 & 1 \\ 0 & 0 \\ \end{array}} \right],{\bf{W}}_t = \left[ {\begin{array}{*{20}c} {w_t^{\left( \tau \right)} } \\ {w_t^{\left( x \right)} } \\ \end{array}} \right],{\bf{C}}_t = \left[ {\begin{array}{*{20}c} 1 \\ 1 \\ 0 \\ \end{array}} \right],{\bf{V}}_t = \left[ {v_t } \right] \label{eq:ssm_ar2noiseinputs} \end{equation}
This is example of the common practice of “adding” independent components to form a larger model. When you do this, the \(\bf{A}\) and \(\bf{F}\) matrices for the submodels are concatenated diagonally, and the \(\bf{C}\) matrix is concatenated vertically. You can see, for instance, that the \(\bf{A}\) matrix in \eqref{eq:ssm_ar2noiseinputs} is just a block diagonal matrix with blocks from \eqref{eq:ssm_locallevelsystem} and \eqref{eq:ssm_ar2stateeqn}. The \(\bf{F}\) matrix might be a bit harder to see because of the blocks aren’t square, but it is also formed by taking the \(\bf{F}\) from \eqref{eq:ssm_locallevelsystem} and \eqref{eq:ssm_ar2stateeqn} and blocking them diagonally. The ~\ diagonal concatenation can be helpful in building up the \(\bf{A}\) and \(\bf{F}\) matrices and the ~~ vertical concatenation can be used for the \(\bf{C}\).
Getting Information into DLM
DLM has quite a few inputs: at a minimum, the \(\bf{A}\), \(\bf{C}\) and \(\bf{Y}\), the variances for \(\bf{V}\) and \(\bf{W}\), and there are often others. Some of those (such as the \(\bf{Y}\)) will almost certainly be changing with time, while others will be fixed, and some, while they are fixed over time, might depend upon parameters that you’re trying to estimate, and thus won’t be known in advance.
These all go into DLM using options with names based upon the notation given in \eqref{eq:ssm_standardstate} and \eqref{eq:ssm_standardmeasurement}; the \(\bf{A}\) matrix is input with the A option, \(\bf{C}\) with the C option, etc. The options for the two variance matrices are SV and SW. In general, you can give the values for these in the most natural way. For instance, if you want the series LOGGDP to be your \(\bf{Y}\), the option would be Y=LOGGDP. If \(\bf{C}\) is the vector \([1,1,0]\), the option could read C=||1.0,1.0,0.0||. (Note that if you have just one observable, the \(\bf{C}\) can be input either as a row or a column vector.)
DLM will automatically detect whether the input information is time-varying so you don’t have to take special steps as you might with other state-space software.
\(\bf{F}\) is handled by the F option, and defaults to the identity matrix, which is appropriate if the shocks are the same dimension as the states. Many implementations of state-space models don’t have the equivalent of the \(\bf{F}\) matrix—if that’s the case, when the set of fundamental shocks is smaller dimension than the number of states, the covariance matrix of \(\bf{W}\) will be singular. Either way works, but it’s generally much simpler to work with the \(\bf{F}\) mapping matrix.
The two system matrices used the least are \(\bf{Z}\) and \(\mu\)—the options for these are called Z and MU and they default to zeros of the proper dimensions.
The options are simplest for a model with system matrices like \eqrefl{eq:ssm_locallevelsystem}. With just one state and one observable, everything is a scalar. The options on DLM to describe that model are A=1, C=1, Y=series with data, SW=shock variance and SV=measurement variance. If the variances aren’t known constants, and you want to estimate them, you can just use variable names; for instance, if the shock variance is represented by the variable SIGSQETA and the measurement variance by SIGSQEPS, you would use SW=SIGSQETA and SV=SIGSQEPS.
Once you have a model with more than one state, it’s generally easier to create the system matrices using separate instructions. For instance, with the model with matrices \label{eq:ssm_ar2system}, you can set up \(\bf{C}\) and \(\bf{F}\) with
compute c=||1.0|0.0||,f=||1.0|0.0||
(While we’re giving the matrices the same name as the option, that’s not required). More generally, you could use the %UNITV (unit vector) function:
compute c=%unitv(2,1),f=%unitv(2,1)
where you can change the number of states (lags) easily by replacing the 2 in these with the number of lags in the representation.
The proper handling of the \(\bf{A}\) matrix for this model will depend upon whether the \(\varphi _i \) are known constants, or need to be variables. If they’re known (say 1.1 and -.3), you can create the matrix with something like:
compute a=$
||1.1,-0.3 |$
1.0, 0.0||
Note that this could also be done as the more compressed (but harder to read):
compute a=||1.1,-0.3|1.0,0.0||
Suppose instead that they need to be variables (call them PHI1 and PHI2). In that case, we need to create a FRML[RECT], that is, a FRML that evaluates to a RECTANGULAR matrix. You can do that with:
dec frml[rect] af
frml af =$
||phi1,phi2|$
1.0, 0.0||
The FRML instruction can create formulas which evaluate to something other than a real, but the target has to have its type DECLAREd first as is done here. With this definition, the option on DLM would read A=AF.
A more flexible alternative to defining a FRML[RECT] is to create a FUNCTION which returns a RECTANGULAR:
function adlm
type rect adlm
compute adlm=$
||phi1,phi2|$
1.0, 0.0||
end
The \(\bf{A}\) matrix information would then go into DLM with A=ADLM().
The matrices in \eqref{eq:ssm_ar2noiseinputs} are more typical of an actual application of a state-space model, where two (or more) component models are added together to create the overall model.
Note that, even if the \(\varphi _i \) are unknown, the \(\bf{F}\) and \(\bf{C}\) matrices are fixed. So the matrices could be set up with
compute f=$
||1.0,0.0|$
0.0,1.0|$
0.0,0.0||
compute c=||1.0,1.0,0.0||
function adlm
type rect adlm
compute adlm=$
1.0~\$
||phi1,phi2|$
1.0, 0.0||
end
with options A=ADLM(), F=F and C=C. The SW matrix will need to be a \(2 \times 2\). In most cases, the components of \(\bf{W}\) are assumed to be independent of each other, so SW will be diagonal. Diagonal matrices are created most easily using the %DIAG function, which takes a VECTOR and creates a diagonal matrix from it. If we call the two component variances SIGSQETA and SIGSQC, the option could read SW=%DIAG(||SIGSQETA,SIGSQC||).
Getting Information from DLM
DLM has many uses and generates quite a bit of information. You can, for instance, get the estimated states and their variances, predicted observables and prediction errors and variances, state shock estimates and variances, and more. Most of these are retrieved as a SERIES of matrices. For instance, the estimated states are a SERIES[VECTOR]—at each time period analyzed by DLM, there’s a complete VECTOR. In most cases, you will be interested in the time path of only one or two components of this. You can pull items out easily with SET instructions. If STATES is the name of your output, the following will pull the first component out of this:
set xstate = states(t)(1)
Note that you have to use the two sets of subscripts: the (t) is used to pick the entry out of the SERIES of VECTORS and the (1) pulls the first element out of that vector. The variances of the states (and most other variances) are produced as a SERIES of SYMMETRICS. Again, SET is generally the easiest way to pull information out of this. For instance, if SIGMAS is the name assigned to the series of covariance matrices, to get the standard deviation of the first state, do
set xstddev = sqrt(sigmas(t)(1,1))
The first element in either a VECTOR or SYMMETRIC (the (1) or (1,1)) can also be extracted using the %SCALAR function. Using this, the two SET instructions here could also have been done as:
set xstate = %scalar(states)
set xstddev = sqrt(%scalar(sigmas))
One of the most important basic state-space models is the local trend model. One way to write this is
\begin{equation} \begin{array}{c} x_t = x_{t - 1} + \tau _{t - 1} + \xi _t \\ \tau _t = \tau _{t - 1} + \zeta _t \\ y_t = x_t + \varepsilon _t \\ \end{array} \end{equation}
This has two states: a local trend rate (\(\tau _t \)) and the local trend itself (\(x_t\)). If the variance of \(\zeta _t \) is zero, the trend rate is constant, and the series is a linear trend plus noise. If that variance is non-zero, the trend rate can evolve. The model is small enough that it isn’t hard to write down the system matrices, which are just matrices with 1’s and 0’s. However, for convenience, we have a procedure @LocalDLM which constructs these, which can then be used with DLM. The full model above is done with
@localdlm(type=trend,shocks=both,a=at,c=ct,f=ft)
SHOCKS=BOTH allows for the shocks in each of the state equations. More commonly, the shock in the first equation is suppressed, so the default is SHOCK=TREND. The SHOCKS option changes the \(\bf{F}\) matrix, but not the others. @LocalDLM is used in the HPFILTER.RPF example.
The procedure @SeasonalDLM sets up the system matrices for one of two representations of an evolving seasonal. The “sum” of a seasonal model and a trend model (such as is created by @LocalDLM) creates what Harvey calls the basic structural model which can model fairly well many series with a strong seasonal. The two options for the seasonal type are TYPE=FOURIER and TYPE=ADDITIVE. The Fourier gives a smoother pattern, and so is the best choice when the seasonal is linked to weather. If the seasonal is more due to the secular calendar (dominated by holidays, school year and the like), the additive seasonal is a better choice. An example is
@seasonaldlm(type=additive,a=as,f=fs,c=cs)
We use @SeasonalDLM and @LocalDLM in many textbook examples included with RATS, including several from Commandeur and Koopman (2007) and Harvey (1989).
ARMA equations have, in general, many possible state-space representations. If there are any MA components, they can be fairly complicated to set up. Instead, you can use the @ARMADLM procedure which converts an EQUATION (with coefficients already defined) into state-space form. In addition to \(\bf{A}\) and \(\bf{F}\), the example below also defines the \(\bf{Z}\) vector as the constant state shift due to the constant in the equation.
boxjenk(ma=||2,4||,maxl,constant,define=dyeq) dy
@armadlm(a=adlm,f=fdlm,z=zdlm) dyeq
The DLM Instruction
The syntax for DLM is
dlm( options ) start end state vectors state variances
Most of the input and output comes through the options. The two primary outputs, however, are the state vectors and state variances which give the estimated mean and variance of the states.
The TYPE option controls the type of analysis done on the model. Choose TYPE=FILTER (which is the default) for Kalman filtering, and TYPE=SMOOTH for Kalman smoothing.
We have four related examples of the use of DLM for different types of analysis. These are applied to a simple model for the Nile river flow data (100 years of annual data) used at the beginning of Durbin and Koopman(2012). The state-space model for this is the local level model \eqref{eq:ssm_locallevelsystem}. The A and C option settings are from \eqref{eq:ssm_locallevelsystem}. We’re using specific values for the variances from the book. (Estimation of free parameters is covered on another page). PRESAMPLE=DIFFUSE handles the \(f\left( {{\bf{X}}_1 |0} \right)\) that we ignored earlier—that also will be covered on a separate page.
This pulls in the data:
open data nile.dat
calendar 1871
data(format=free,org=columns,skips=1) 1871:1 1970:1 nile
The common DLM options (which basically describe the model being used) are
a=1.0,c=1.0,sv=15099.0,sw=1469.1,y=nile,presample=diffuse
DLMEXAM1.RPF does Kalman smoothing (option TYPE=SMOOTH). DLMEXAM2.RPF does Kalman filtering (TYPE=FILTER, which is the default) doing predictions for 10 years beyond the sample. Once there is no longer data in the Y series, the Kalman filter just does the prediction step and not the correction step.
It’s easy to generate an unconditional draw for the states because of the sequential independence. Given \(\bf{X}_{t-1}\), we draw \(\bf{W}_t\) and compute
\begin{equation} {\bf{X}}_t = {\bf{A}}_t {\bf{X}}_{t - 1} + {\bf{Z}}_t + {\bf{F}}_t {\bf{W}}_t \end{equation}
We repeat this as necessary. This is what is done with DLM with the option TYPE=SIMULATE. That, however, isn’t very interesting since the generated states will have no relationship to the observed data. The one real use of this is for out-of-sample simulations, allowing for analysis of more complicated functions of forecasts than just the mean and variance. This is done in DLMEXAM3.RPF, which first Kalman filters through the data period, which will give us the end-of-data mean and variance for the state. It then simulates (10000 times) the model for 50 periods beyond the end of the observed data, keeping track of the maximum value achieved by the flow in each simulation. The DLM instruction that does the simulation uses the X0 and SX0 options for inputting the initial mean and variance as the final values from the Kalman filter.
Of greater interest is a draw for the states conditional on the data (done using TYPE=CSIMULATE for conditional simulation). The Kalman smoother gives us the mean and covariance matrix for each state individually, but that isn’t enough to allow us to do draws, since conditional on the data, the states are highly correlated. Conditional simulation is even more complicated than Kalman smoothing, but can be done with a couple of Kalman smoothing passes, one on simulated data. Conditional simulation is chosen in RATS with the option TYPE=CSIMULATE (conditional simulation).
The main use of conditional simulation is in Bayesian techniques such as Markov Chain Monte Carlo. In fact, it is sometimes known as Carter-Kohn, after the algorithm described in Carter and Kohn (1994) as a step in a Gibbs sampler. Conditional on data and other parameters, it gives a draw for the states. The next step would then be to produce a draw for the other parameters, conditional on the states (and data).
A simple example of conditional simulation is in DLMEXAM4.RPF. There are many (an infinite number) of combinations of \(\bf{W}\) and \(\bf{V}\) shocks that will produce the observed data. Kalman smoothing itself computes the “most likely” (highest density) set of shocks. Conditional simulation draws from the joint distribution for which the smoothed estimates are the mean.
Copyright © 2025 Thomas A. Doan