Statistics and Algorithms / Vector Autoregressions /

Structural VAR

Cholesky factorizations suffer from the problem of imposing a “semi-structural” interpretation on a mechanical procedure. For instance, the “money” innovation is different (sometimes radically so) if you put money first in the ordering than if you put it last. There will rarely be a nice, neat, publicly acceptable way to order the VAR variables.

Bernanke (1986) and Sims (1986) independently proposed alternative ways of looking at the factorization problem which impose more of an economic structure. These have been dubbed “structural VARs” (SVAR for short) or “identified VARs.” The theory of these is developed more fully in Hamilton (1994) and L ü tkepohl (2006), while Enders (2014) covers them from a practical standpoint.

First, note that in the VAR model

\begin{equation} {\bf{y}}_t = \sum\limits_{s = 1}^p {\Phi _s {\bf{y}}_{t - s} + {\bf{u}}_t \,\,\,\,\,\,\,\,;\,\,\,\,\,\,\,\,\,\,\,E({\bf{u}}_t {\bf{u'}}_t ) = \Sigma } \end{equation}

the lag coefficients \({\Phi _s }\) can be estimated by single equation OLS regardless of any restrictions on the \(\Sigma \) matrix. Suppose now that we write down a model for the (non-orthogonal) innovation process \({\bf{u}}\), such as

\begin{equation} \begin{array}{*{20}c} {u_{1t} = v_{1t} } \hfill \\ {u_{2t} = \gamma {\kern 1pt} u_{1t} + v_{2t} } \hfill \\ {u_{3t} = \delta {\kern 1pt} u_{1t} + v_{3t} } \hfill \\ \end{array} \end{equation}

where we assume the \({\bf{v}}\)'s are orthogonal. This puts a restriction on \(\Sigma \): there are six free elements in \(\Sigma \) (in general, there are \(N(N+1)/2\)) and only five in this setup: \(\gamma\), \(\delta\) and the variances of the \({\bf{v}}\)'s. We can obtain a related Cholesky factorization by adding either \({u_{2t} }\) to the \({u_{3t} }\) equation (order 1–2–3) or vice versa (order 1–3–2). The above model causes \(u_2\) and \(u_3\) to be related only through \(u_1\).

In general, if we write the innovation model as

\begin{equation} {\bf{Au}}_t = {\bf{v}}_t \,\,\,\,;\,\,\,\,E({\bf{v}}_t {\bf{v'}}_t ) = {\bf{D}}\,\,\,;\,\,\,{\bf{D}}\,\,{\rm{diagonal}} \end{equation}

and assume Normal residuals, we need to maximize over the free parameters in \(\bf{A}\) and \(\bf{D}\) the likelihood-based function:

\begin{equation} \frac{T}{2}\left\{ {\log \left| {\bf{A}} \right|^2 - \log \left| {\bf{D}} \right|} \right\} - \frac{T}{2}trace({\bf{D}}^{ - 1} {\bf{ASA'}}) \end{equation}

where \(\Sigma \) is the sample covariance matrix of residuals.

In RATS, structural VARs are estimated using the instruction CVMODEL. This actually accepts a broader class of models. The general form is

\begin{equation} {\bf{Au}}_t = {\bf{Bv}}_t \,\,\,\,;\,\,\,\,E({\bf{v}}_t {\bf{v'}}_t ) = {\bf{D}}\,\,\,;\,\,\,{\bf{D}}\,\,{\rm{diagonal}} \label{eq:var_svarabmodel} \end{equation}

Of course, \({\bf{B}} = {\bf{I}}\) gives the model from before. Typically, a model will use just one of \(\bf{A}\) and \(\bf{B}\). A “B” model would come from a view that you know what the orthogonal shocks are and are using the \(\bf{B}\) matrix to tell which variables they hit. A model using both \(\bf{A}\) and \(\bf{B}\) would likely have just a few well-placed free coefficients in \(\bf{B}\) to allow for correlations among residuals in structural equations. For instance, if you have two structural equations for \(\bf{u}\), but are unwilling to restrict them to having uncorrelated residuals, a \(\bf{B}\) matrix with a non-zero coefficient at the off-diagonal location linking the two will allow them to be correlated.

Identification

When you write them down, innovation models look a lot like standard simultaneous equations models. Unfortunately, there is no simple counting rule like the order condition to verify identification. If you have more than \(N(N-1)/2\) free parameters in \(\bf{A}\), your model is definitely not identified, but it may be unidentified even with fewer. For example, if we start with \(\bf{D}\) being the identity and \(\bf{A}\) matrix

\begin{equation} \left[ {\begin{array}{*{20}c} {1.0}&{0.5}&{0.0} \\ {0.0}&{1.0}&{1.2} \\ { - .3}&{0.0}&{1.0} \\ \end{array}} \right] \end{equation}

there is an alternate factorization with

\begin{equation} \left[ {\begin{array}{*{20}c} {1.0}&{1.253}&{0.0} \\ {0.0}&{1.0}&{1.925} \\ { - 2.303}&{0.0}&{1.0} \\ \end{array}} \right] \end{equation}

The likelihood function will have a peak at each of these.

In most cases, you can rely on the methods in Rubio-Ramirez, Waggoner, and Zha (2010) to check identification. While this is a very technical paper designed to apply to a wide range of situations, it provides as special cases a couple of easy-to-verify rules:

•In an “A” model identified with zero restrictions, the SVAR is identified if and only if exactly one row has j zeros for each j from 0 to m-1. Equivalently, if and only if exactly one row has j non-zero elements for each j from 1 to m.

•In a “B” model identified with zero restrictions, the SVAR is identified if and only if exactly one column has j zeros for each j from 0 to m-1 with the analogous condition for non-zeros.

For example, the following pattern of zeros and non-zeros:

\begin{equation} \left[ {\begin{array}{*{20}c} \bullet & 0 & 0 & \bullet \\ \bullet & \bullet & \bullet & 0 \\ \bullet & \bullet & 0 & \bullet \\ \bullet & \bullet & 0 & 0 \\ \end{array}} \right] \end{equation}

will not properly identify an “A” model (rows have 2-1-1-2 zeros, not 0-1-2-3 in some order), but will identify a “B” model (columns have 0-1-3-2). Note that identification for the A model fails even though there is nothing “obviously” wrong with it (like two rows being identical).

Referring to the earlier example, we can tell immediately using this method that the model isn't globally identified, because it has two non-zero elements in each row, rather than 1, 2, and 3 non-zero elements in rows 1 through 3 respectively.

Estimation

Before running CVMODEL, you need to create a FRML which describes your \(\bf{A}\) or \(\bf{B}\) matrix. This must be declared as a FRML[RECT], which is a formula which produces a rectangular matrix. Whatever free parameters you will have in this also need to be put into a parameter set using NONLIN. For instance, for the small model:

\begin{equation} \begin{array}{*{20}c} {u_{1t} = v_{1t} } \hfill \\ {u_{2t} = \gamma {\kern 1pt} u_{1t} + v_{2t} } \hfill \\ {u_{3t} = \delta {\kern 1pt} {\kern 1pt} u_{1t} + v_{3t} } \hfill \\ \end{array} \end{equation}

the set up would be

nonlin gamma delta

dec frml[rect] afrml

frml afrml = ||1.0,0.0,0.0|-gamma,1.0,0.0|-delta,0.0,1.0||

We also need the covariance matrix of residuals, which will usually come from an ESTIMATE instruction on a VAR. Now the covariance matrix itself is a sufficient statistic for estimating the free coefficients of the model. However, in order to obtain standard errors for the coefficients, or to test overidentifying restrictions, we need to know the number of observations. RATS keeps track of the number of observations on the most recent estimation, so if you have just estimated the VAR, you won’t need to do anything about that. However, if the covariance matrix was estimated separately, you should use the option OBS=number of observations on CVMODEL.

Note, by the way, that it is good practice to analyze (that is, calculate and graph impulse responses) with the simpler Cholesky factor model before you try to estimate your structural model. Structural VAR's take the underlying lag model as given and model only the contemporaneous relationship. If there's a problem with the lag model (for instance, explosive roots due to some type of data error), that will be much easier to see with the Cholesky factor model, since it's fully constructive, and doesn't depend upon non-linear estimation.

CVMODEL provides three choices for the (log) maximand that it uses in estimating the free coefficients in \eqref{eq:var_svarabmodel}. All are based upon Normally distributed v’s, but differ in their handling of the D matrix. They fall into two general forms:

\begin{equation} \frac{{T - c}}{2}\left\{ {{\rm{log}}\left| {\bf{A}} \right|^2 - \log \left| {\bf{B}} \right|^2 } \right\} - \left( {\frac{{T - c}}{2} + \delta + 1} \right)\sum\limits_i {\log \,\,\left( {{\bf{B}}^{ - 1} {\bf{ASA'}}{\kern 1pt} {\bf{B'}}^{ - 1} } \right)_{ii} } \label{eq:var_svarconcentrated} \end{equation}

\begin{equation} \frac{{T - c}}{2}\left\{ {{\rm{log}}\left| {\bf{A}} \right|^2 - \log \left| {\bf{B}} \right|^2 } \right\} - \frac{T}{2}\sum\limits_i {\left( {{\bf{B}}^{ - 1} {\bf{ASA'}}{\kern 1pt} {\bf{B'}}^{ - 1} } \right)_{ii} } \label{eq:var_posteriorlogl} \end{equation}

With \(c=0\) and \(\delta=-1\) in \eqref{eq:var_svarconcentrated}, you have the concentrated likelihood. This form is selected by using DMATRIX=CONCENTRATE, which is the default. Other values of \(\delta\) have \(\bf{D}\) integrated out. This is DMATRIX=MARGINALIZED, combined with PDF=value of delta. This uses a prior of the form

\(\left| {\bf{D}} \right|^{ - \delta } \)

(PDF stands for Prior Degrees of Freedom). With \(c=0\) in \eqref{eq:var_posteriorlogl}, you have the likelihood with \(\bf{D}=\bf{I}\). This is selected with DMATRIX=IDENTITY. This requires a different parameterization of the basic factoring model than the other two. The concentrated and marginalized forms both assume that the parameterization of the \(\bf{A}\) and \(\bf{B}\) matrices includes a normalization, generally by putting 1’s on the diagonal. With DMATRIX=IDENTITY, the normalization is chosen by making the \(\bf{v}\)'s have unit variance, so the diagonal in \(\bf{A}\) or \(\bf{B}\) has to be freely estimated. This was the choice in Sims and Zha (1999), as they had some concern that the normalization of one of their equations was not innocuous. The maximum likelihood estimates aren't affected by the choice of normalization, but the Monte Carlo integration process (example MONTESVAR.RPF) is.

In all of these, \(c\) is for correcting the degrees of freedom if you’re examining the posterior density. You provide the value of \(c\) using the option DFC=value of c.

For estimation methods, CVMODEL offers BFGS and the derivative-free methods. BFGS is the only one of these that can estimate standard errors. However, BFGS, as a hill-climbing method, depends crucially on whether you are starting on the right “hill.” If you want to play safe, start out with the GENETIC method, which explores more broadly (but, unfortunately, also more slowly). You can use the PMETHOD and PITERS options to choose a preliminary estimation method before switching to BFGS.

You can also use CVMODEL(METHOD=EVAL) to evaluate the log likelihood at the initial guess values. Note that while we have left the integrating constants out of the formulas \eqref{eq:var_svarconcentrated} and \eqref{eq:var_posteriorlogl}, they are included in the log likelihood that CVMODEL produces.

The following code estimates the original SVAR. %SIGMA is the (previously estimated) covariance matrix of residuals:

compute gamma=0.0,delta=0.0

cvmodel(method=bfgs,a=afrml) %sigma

This is the output produced:

Covariance Model-Concentrated Likelihood - Estimation by BFGS

Convergence in 9 Iterations. Final criterion was 0.0000010 <= 0.0000100

Observations 250

Log Likelihood -1594.3951

Log Likelihood Unrestricted -1589.7001

Chi-Squared(1) 9.3901

Significance Level 0.0021816

Variable Coeff Std Error T-Stat Signif

*********************************************************************************

1. GAMMA 0.4336094435 0.0647412945 6.69757 0.00000000

2. DELTA 0.6020698430 0.0605333608 9.94608 0.00000000

In the header, it displays the log likelihood of the estimated model, and the log likelihood of an unrestricted model. For a just-identified model, those should be equal; if they aren't you got stuck on a ridge and need to do some extra work with the non-linear estimation options to find the global optimum. If the model is overidentified (which this one is), it produces a likelihood ratio test for the overidentifying restrictions. In this case, the restriction is rejected fairly soundly.

The log likelihood is accessible for further analysis as %FUNCVAL or %LOGL, while the test statistic and significance level are %CDSTAT and %SIGNIF. To get the factor of the \(\Sigma\) that you will need for impulse responses, use the FACTOR option on CVMODEL. You can use the DVECTOR option to get the variances of the orthogonal components. Note that if the model is over-identified, the factor matrix generated by CVMODEL won’t exactly reproduce the covariance matrix if you take its outer product—if you try it, you’ll typically see some values matching exactly, while some don’t, as the typical overidentified model generally includes some blocks of variables for which the model is just-identified.

Directly Modeling the Covariance Matrix

The A and B forms are designed (in effect) to model a factor of the covariance matrix. Sometimes, however, we have a model which directly gives the covariance matrix itself. If you have this, you can provide a V formula (a FRML[SYMMETRIC]) which models \(\Sigma\). The likelihood used is (omitting the constants):

\begin{equation} \frac{{T - c}}{2}\left\{ {{\rm{log}}\left| {{\bf{V}}^{ - 1} } \right|} \right\} - \frac{T}{2}trace\left( {{\bf{SV}}^{ - 1} } \right) \end{equation}

For instance, a simple “factor” model would model the covariance matrix as

\(\Lambda \Lambda ' + {\bf{D}}\)

where \(\Lambda \) are the loadings on the orthogonal factors and \({\bf{D}}\) is a diagonal matrix. With \(\Lambda \) constructed by a vertical concatenation of RECTANGULAR matrices LLEAD and LREM, and D being a VECTOR, the following would estimate the free parameters in a fit to the covariance matrix R.

nonlin llead lrem d

cvmodel(v=%outerxx(llead~~lrem)+%diag(d.*d),obs=%nobs) r