Causality Testing

(Granger) Causality

Granger (1969) proposed a concept of causality based upon prediction error: \(X\) is said to Granger-cause \(Y\) if \(Y\) can be forecast better using past \(Y\) and past \(X\) than just past \(Y\). This is a seemingly minor criterion, which, as stated, really only tells us whether we can use a more restricted model for forecasting \(Y\). However, Sims (1972) demonstrated that this was equivalent to a much more important criterion: that \(X\) fails to Granger-cause \(Y\) if and only if \(Y\) is econometrically exogenous in an \(X\) on \(Y\) dynamic regression. With the help of this result, the "Granger-" in Granger-cause has now largely been eliminated so that "cause" on its own now means Granger-cause, and an exogeneity test typically refers to a test for absence of causality (in the proper context).

There have been several tests proposed for Granger causality in addition to the obvious test of lags of \(X\) in a regression of \(Y\) on lags of \(X\) and \(Y\) (the "Granger" test). The "Sims test" is a direct test of the result in the Sims paper: regress \(X\) on lags, current and leads of \(Y\) and test the leads of \(Y\). However, this is rarely used because it was found to reject non-causality too often (that is, the size of the test was wrong). There is also the Pierce-Haugh test which looks at cross-correlations between pre-whitened residuals, which was rather quickly eliminated from consideration for the opposite reason: it had very low power (in addition, it required an extra step of fitting ARIMA models to each of the series). A fourth test was proposed in Geweke, Meese and Dent(1982) which aimed to correct the problems with the direct Sims test by also including lags of \(X\) in the regression (thus \(X\) on lags of \(X\), lags, current and leads of \(Y\) and test the leads of \(Y\)). This was rarely employed as it (apparently) had no real advantage over the simpler Granger test. However, it turns out to have some real statistical value despite that.

See causal.rpf for an example of the different forms of tests.

Causality with Three or More Variables

The Granger criterion can be (and often is) applied to one variable in a set of three or more variables. However, it turns out that that exclusion test has little actual meaning (beyond the possibility of simplifying the \(Y\) equation in the system). If \(Z\) fails to Granger-cause \(Y\) in a three variable system with \(X\), \(Y\) and \(Z\) (that is, you fail to reject when testing for zeros on lagged \(Z\)'s in a regression of \(Y\) on lagged \(Y\)'s, \(X\)'s, and \(Z\)'s), there is nothing preventing \(Z\) from helping to predict \(Y\) at steps two or later if \(Z\) "Granger-causes" \(X\) instead. In other words, the "causality" or lack of it refers to an very narrow criterion that says little about the relationship among the variables. On the other hand, if \(Z\) fails to Granger-cause both \(X\) and \(Y\), or, both \(Z\) and \(X\) fail to Granger-cause \(Y\), then there is no route for \(Z\) to help forecast \(Y\) at any step. In both of those cases, you have a multivariate extension of the Sims(1972) result—in the first case, \(X\) and \(Y\) form an exogenous block (relative to \(Z\)) and in the second \(Y\) is exogenous relative to the block of \(X\) and \(Z\).

See varcause.rpf for an example.

Causality with Series with Unit Roots

Sims, Stock and Watson(1990) analyzed the behavior of hypothesis tests in linear regressions in the presence of unit roots in some or all of the variables. Their results show that most hypotheses have "standard" asymptotics, that is, the usual use of t (for single coefficients) and F or chi-squared for joint tests are justified asymptotically. However, one very important exception to this is the Granger test—excluding all lags on a non-stationary regressor produces a non-standard asymptotic distribution. Not only is it non-standard, but it's effectively impossible to analyze in any meaningful way because it depends upon the behavior of the entire system. In practice, however, it's not so different from the standard distribution that an overwhelming rejection of the zero restriction would be overturned if this were taken into account. See varcause.rpf for an example of a test which is way out in the tails.

One possible way to handle this is to bootstrap the Granger test. See grangerbootstrap.rpf for an example of that. You can also use the Geweke-Meese-Dent(1982) variation of the Sims test which is included in causal.rpf. That avoids the non-standard asymptotics by testing only a subset of the lag/leads of a variable. (Note that the GMD test reverses the dependent variable from the Granger test—to test \(X\) causing \(Y\), you regress \(X\) on lagged \(X\), and lagged, current and leads of \(Y\) and test the leads of \(Y\)).

Also, if you're willing to assume that each of the series has a unit root, but there is no cointegration among the variables in the system, then you can run the test on differenced data. (The VAR in differences is misspecified in the presence of cointegration, but not if each series has a separate unit root).

Note, by the way, that it's quite possible that you will do the test "carefully" and get almost the identical result—the effect of the unit roots on the test is unknown and could, in fact, be fairly minor.

One "non-fix" to this is from Toda-Yamamoto(1995). Despite there being quite a few usages of this in the literature, it is simply bad statistics. In short, it tests for causality by adding an "extra" lag to a VAR and then tests zero restrictions which don't include those added lags. This does provide a test which (under the null) will have the correct asymptotic distribution since it doesn't test all the lags of the non-stationary variable. However, it's not a test of Granger causality (which requires testing all lags). By adding extra lags and then not testing them, the bad behavior is shifted onto the untested lags. It will suffer badly from lack of power since, if the coefficients aren't, in fact, zero, the "causality" will get shifted fairly easily to the untested lag(s) since an integrated process is so highly autocorrelated. An even worse idea which has come up in the literature is bootstrapping the TY test. The only "good" feature of the TY test relative to the actual Granger test is that it has standard asymptotics. If one is bootstrapping, one should bootstrap the actual test (as in grangerbootstrap.rpf) and not a bad alternative.

Short- and Long-Run Causality

If (\(X\),\(Y\)) is a stationary process, then the effect of any shock to the system will eventually die out. If \(X\) causes \(Y\), then \(X(t)\)'s ability to help forecast \(Y(t+h)\) for large values of \(h\) is effectively zero. (In fact, even \(Y(t)\) itself doesn't really help forecast \(Y(t+h)\)). In this case, any causality will, of necessity be short-run causality.

By contrast, if (\(X\),\(Y\)) are (both) non-stationary, in general, (almost) any shock to the system will have permanent effects. (The Blanchard-Quah calculation comes up with a single shock shape which converges to zero in the long-run—all other shocks have permanent effects). If \(X\) Granger causes \(Y\), the \(X(t)\) will help forecast \(Y(t+h)\) at any \(h\), no matter how large. This can be considered long-run causality.

However, if (\(X\),\(Y\)) are non-stationary, but cointegrated, it's possible to have only short-run causality. If we write the equation for \(Y\) in error correction form:

\begin{equation} \Delta Y = \alpha \left( {{Y_{t - 1}} - \beta {X_{t - 1}}} \right) + {\zeta _{y1}}\Delta {Y_{t - 1}} + \ldots + {\zeta _{yp}}\Delta {Y_{t - p}} + {\zeta _{x1}}\Delta {X_{t - 1}} + \ldots + {\zeta _{xp}}\Delta {X_{t - p}} + {\varepsilon _{yt}} \end{equation}

then \(X\) enters in two places: the loading \(\alpha\) on the cointegrating vector (if \(\beta\) were zero, the two series wouldn't be cointegrated in the first place), and the \(\zeta\) coefficients on its lagged differences. A test for causality would be a joint test that all those coefficients are zero—if you reject that, you would conclude that there is causality from \(X\) to \(Y\). There's a separate test for long-run causality by looking solely at \(\alpha\). If \(\alpha=0\), the ability of \(X\) to help predict \(Y\) dies off with increasing horizon. If \(\alpha=0\), \(Y\) is said to be weakly exogenous. A more complete term for that would be weakly exogenous for \(\beta\)—the \(Y\) equation provides no information about \(\beta\) as all the adjustment to keep \(Y\) and \(\beta X\) close to each other comes from adjusting \(X\).

Note that some papers have done a test just on the lagged differences of \(X\) and called it a test for short-run causality. That gets the null and alternative confused. "Short-run causality" is shorthand for "causality but not long-run causality." The proper procedure is to do a joint test, and then, if you conclude that there is causality, test \(\alpha=0\) to see if there's long-run causality. (You might want to do the long-run test regardless of the outcome of the overall test, since the power of the test might be diluted if you have a large number of lagged differences). You could also test \(\alpha=0\) first, and if you conclude that there's no long-run causality, test the lagged differences imposing \(\alpha=0\) (that is, run the regression without the error correction term). However, a test on the lagged differences without either jointly testing and imposing \(\alpha=0\) has no useful interpretation.

Note that, because everything in this equation is stationary, you can use standard testing techniques. An example is provided in vecmcause.rpf.

Causality in VAR-GARCH Models

If you have a GARCH model (or any other non-linear model) which has a mean model that takes the form of a VAR or VECM, you can test for causality (in the mean!) using a Wald test. Note that it's possible for there to be causality in the variance (that is, "other" lagged residuals affect the variance) without having causality in the mean. The issues about testing causality in 3 or more systems apply here—you can't really conclude much from knocking out the lags of just one variable in one equation in a 3 variable or larger system. However, if it's bivariate, or if you are excluding a full block from a larger system as described above, the same basic interpretation applies even though you have a GARCH model for the variance.

It's easiest to set this up using Regression Tests, Exclusion Restrictions test wizard. See vecmgarch.rpf for an example (it's for short-and-long-run causality, but it's the same idea even if it's a VAR rather than a VECM).

Causality in ARDL Models

Several papers have described as "Granger causality" the test on the lagged value of \(X\) in the ARDL equation

\begin{equation} {Y_t} = {\alpha _1}{Y_{t - 1}} + {\beta _0}{X_t} + {\beta _1}{X_{t - 1}} + {\varepsilon _t} \label{eq:causality_ardl} \end{equation}

While superficially similar (they both test lags in a bivariate dynamic regression), they are, in fact, not similar at all in the way they restrict the dynamic properties of the series. If the bivariate VAR for \(X\) and y has both represented as univariate autoregressions, with contemporaneously correlated innovations, the lagged \(X\) in the ARDL will have a non-zero coefficient, even though \(X\) (by definition) fails to Granger cause \(Y\). And, if the coefficient on the lag is zero in the ARDL, it's almost a certainty that \(X\) actually does Granger cause \(Y\) if the coefficient on current \(X\) is non-zero. The test thus has no implications one way or the other for the causal relationship. Instead, it's the test of a particular coefficient in an ARDL model.

Causality in Panel Data

A Granger causality test is, in a bivariate setting, simply a restriction on the coefficients in one equation in a VAR. In a tightly parameterized model, this is straightforward: estimate the unrestricted model and do a Wald test on the lags as described above for the GARCH model.

More interesting is the application to small N-big T data sets, where homogeneity is both unreasonable and unnecessary. For a VAR on macroeconomic data, everything being heterogeneous would seem to be the right choice. With heterogeneity under the null and the alternative, the test would reject (theoretically) if there is Granger causality in any of the individual regressions, that is, the null is that there is no Granger causality in any individual, while the alternative is that there is Granger causality in at least one. Of course, the power of the test will be fairly low if there is only causality in a few individuals.

Allowing for full heterogeneity (plus independence across individuals), a likelihood ratio test is quite easy to compute. It is, in fact, just the sum of the individual likelihood ratios for Granger causality (with degrees of freedom summing as well). Since rejection of the null of non-causality means that causality is found in some (though not necessarily all) of the individuals, an obvious way to display the results is with both the joint test and the individual tests. Note, by the way, that it is possible for all the individual tests to be insignificant at conventional significance levels while the joint test is strongly significant. That's not unexpected, since the whole point of the joint test is to give us more and better information than is available in the individual samples.

For an example, see panelcause.rpf.

Rolling Sample Causality Tests

While this is certainly doable (see rollingcausality.rpf), it's not entirely clear what the point is. The causality tests will likely give quite different results from different subsamples just due to sampling error—in the example program there is no causality at all, and the model is completely stable, and yet the results range from solidly "significant" to almost zero. Note that bootstrapping the significance levels as part of this doesn't change this since bootstrapping won't change the (apparently) large causality tests that can be observed when you do a large number of test windows.