Statistics and Algorithms / Vector Autoregressions /

Orthogonalization

As described earlier, in

\begin{equation} {{\bf{y}}_t} = {{\bf{\hat y}}_t} + \sum\limits_{s = 0}^\infty {{\Psi _s}{{\bf{u}}_{t - s}}} \label{eq:VARBasicMAR} \end{equation}

there are many equivalent models where you replace \(\bf{u}\) with \(\bf{Gu}\) and all \(\Psi_s\) with \({\Psi}_s{{\bf{G}}^{ - 1}}\) for any non-singular matrix \(\bf{G}\). If we choose any matrix \({\bf{G}}\) so that

\({\bf{G}}\Sigma {\kern 1pt} {\bf{G'}} = {\bf{I}}\)

then the new innovations \({{\bf{v}}_t} = {\bf{Gu}_t}\) satisfy

\(E\left( {{{\bf{v}}_t}{{{\bf{v'}}}_t}} \right) = {\bf{I}}\)

These orthogonalized innovations have the convenient property that they are uncorrelated both across time and across equations. Such a matrix \({\bf{G}}\) can be obtained by inverting any solution \({\bf{F}}\) of the factorization problem \({\bf{FF'}} = \Sigma \). There are many such factorizations of a positive definite \(\Sigma\), among them:

•those based on the Cholesky factorization, where \({\bf{G}}\) is chosen to be lower triangular.

•structural decompositions of the form suggested by Bernanke(1986) and Sims(1986).

Note that "orthogonal" means (in this context) having zero covariance. Having an identity covariance matrix is stronger than that (any diagonal covariance would also meet that definition), but since the only difference between a general diagonal matrix and the identity are simple scale factors on each diagonal element, it's simpler to restrict ourselves to a target of the identify matrix.

The method packaged into IMPULSE, ERRORS, HISTORY and SIMULATE is the Cholesky factorization, which chooses \({\bf{F}}\) (or equivalently \({\bf{G}}\)) to be lower triangular). If you compute a different factorization, you input it using the FACTOR option (the older DECOMP can also be used) on the above instructions.

Impact Responses

If we write \({{\bf{u}}_t} = {\bf{F}}{{\bf{v}}_t}\), then, when \({{\bf{v}}_t}\) is the unit vector \(e(i)\), \({{\bf{u}}_t}\) is just column \(i\) of \({\bf{F}}\). These are known as the impact responses of the components of \({\bf{v}}\), since they show the immediate impact to the variables \({\bf{y}}\) due to the shocks \({\bf{v}}\).

Why Orthogonalize?

Orthogonalized innovations have two principal advantages over non-orthogonal ones:

•Because they are uncorrelated, it is very simple to compute the variances of linear combinations of them.

•It can be rather misleading to examine a shock to a single variable in isolation when historically it has always moved together with several other variables. Orthogonalization takes this co-movement into account.

The greatest difficulty with orthogonalization is that there are many ways to accomplish it, so the choice of one particular method is not innocuous. The Bernanke-Sims style structural decompositions are designed to overcome some of the objections to the methodology by modeling the decomposition more carefully.

Cholesky Factorization

The standard orthogonalization method used by RATS is the Cholesky. Given a positive-definite symmetric matrix \(\Sigma\), there is one and only one factorization into \({\bf{FF'}}\) such that \({\bf{F}}\) is lower triangular with positive elements on the diagonal. This is the Cholesky factorization.

We can obtain several related decompositions by reordering rows and columns of \(\Sigma\). For instance, if the \(\Sigma\) matrix is

\begin{equation} \left[ {\begin{array}{*{20}{c}} {1.0}&{4.0}\\ {4.0}&{25.0} \end{array}} \right] \end{equation}

its Cholesky factor is

\begin{equation} \left[ {\begin{array}{*{20}{c}} {1.0}&{0.0}\\ {4.0}&{3.0} \end{array}} \right] \end{equation}

If we interchange variables 1 and 2 in the covariance matrix,

\begin{equation} \left[ {\begin{array}{*{20}{c}} {25.0}&{4.0}\\ {4.0}&{1.0} \end{array}} \right] \end{equation}

has a factor of

\begin{equation} \left[ {\begin{array}{*{20}{c}} {5.0}&{0.0}\\ {0.8}&{0.6} \end{array}} \right] \end{equation}

If we switch the rows on the latter array, a new factor of the original \(\Sigma\) is obtained:

\begin{equation} \left[ {\begin{array}{*{20}{c}} {0.8}&{0.6}\\ {5.0}&{0.0} \end{array}} \right] \end{equation}

We describe the first factorization of \(\Sigma\) as the decomposition in the order 1-2 and the second as the order 2–1.

The Cholesky factorization is closely related to least squares regression. If we are decomposing the covariance matrix of a set of variables, the diagonal element \(i\) of the factor is the standard error of the residual from a regression of variable \(i\) on variables 1 to \(i–1\).

There is a different factorization for every ordering of the variables, so it will be next to impossible to examine all of them for systems with more than three variables. Usually, you will decide upon the ordering based mostly upon a “semi-structural” interpretation of the model: you might feel, for instance, that within a single time period, movements in \(y_1\) precede movements in \(y_2\), so \(y_1\) should precede \(y_2\) in the ordering.

Note that when the residuals are close to being uncorrelated, the order of factorization makes little difference. With low correlation, very little of the variance in a variable can be explained by the other variables.