RATS 11
RATS 11

Toda-Yamamoto Causality Test: A Cautionary Tale

 

It’s been known for quite some time that the standard form “Granger causality” lag exclusion test has non-standard (and basically uncomputable) asymptotics if the variables involved are I(1). This is one of the implications of Sims, Stock and Watson(1990)—in a linear regression involving (possibly) non-stationary series, a test has “standard” asymptotics (that is, Wald tests have the correct distribution asymptotically) only if the test can be arranged to be on stationary variables. For the Granger test regression, if we are testing for \(x\) causing \(y\), the lag polynomial on \(x\) using \(p+1\) lags is:

\begin{equation} \sum\limits_{i = 1}^{p + 1} {\beta _i x_{t - i} } \end{equation}

and the Granger test would be a joint exclusion test on all the \(\beta\)’s. By repeatedly substituting

\begin{equation} x_{t - i} = \Delta x_{t - i} + x_{t - (i + 1)} \end{equation}

we can replace this with a sum in the form

\begin{equation} \beta _{p + 1}^* x_{t - (p + 1)} + \sum\limits_{i = 1}^p {\beta _i^* \Delta x_{t - i} } \label{eq:tycausality_subst} \end{equation}

where the \({\beta _i^* }\) are linear combinations of the original \(\beta\). This is almost a linear combination of the stationary differences, but while you can do a different sequence of substitutions to shift the one lagged level term to the beginning rather than the end (or even can put it somewhere in the middle), no matter what you do, you will end up with a non-stationary term.


The idea behind Toda-Yamamoto(1995) (and independently Dolado-Lutkepohl(1996)) is to run an “augmented” regression with at least one extra lag and test all the lags except the final one. Because this is equivalent to a test only on differences, this has standard asymptotics whether \(x\) is \(I(1)\) or \(I(0)\).


While this, at first glance, looks reasonable, it’s, in fact, incorrect as a method for testing Granger causality, and examining why can help in understanding how non-stationarity can affect estimators.

 

First off, note that the result depends upon the “correct” lag length \(p\) being known, which will never be true in practice. But, even if \(p\) is known, the result is still flawed. The problem is that the connection between the TY test and Granger causality assumes that the extra, untested, lag is zero (since the true model has \(p\) lags, the coefficient on lag \(p+1\) must be zero). However, that’s where the flaw is. To see why in a simpler situation, suppose \(x\) is just a random walk:

\begin{equation} x_t = x_{t - 1} + u_t \end{equation}

For this model \(p\) is known to be 1. However, if you run a regression with lag 50 only (thus a 50 lag regression excluding lags 1 to 49), the coefficient on lag 50 will not be zero; in fact, it will be near one. The problem with the TY logic is that if \(x\) is \(I(1)\), even distant lags can proxy for the omitted ones. Note that this is not really dependent upon the process being \(I(1)\). If the process has, for instance, a dominant root of .9, it’s true that 49 lags may be enough to create an effective lack of correlation, but a more typical 4 or 5 lags will leave the augmenting lag as a fairly strong proxy.

 

Now there have been Monte Carlo examinations of the TY test which seem to validate the approach. The fact that the “size” seems to be correct isn’t a surprise, since that’s what Sims, Stock and Watson would predict. And, it turns out that there are alternatives against which the TY procedure has power. However, there are others against which it does not.

 

To see where it “works”, consider a case where \(y\) and \(x\) are \(I(1)\), but aren’t cointegrated. If that’s the case, then the relationship between \(y\) and \(x\) can be written in differences. So if we look at the augmented polynomial \eqref{eq:tycausality_subst}, if some of the \({\beta _i^* }\) on the differences are non-zero, the augmenting lagged level will be a poor proxy for those differences with non-zero coefficients—asymptotically, it has a zero correlation with what are (to it) future differences. So the restricted regression will end up being almost the same as a full exclusion of all lagged \(x\). Where the TY test has a “blind spot” is where the series are cointegrated and there is “long-run” causality from \(x\) to \(y\), that is, the coefficient on the cointegrating vector is non-zero. Because the causality comes in on a lagged level (rather than lagged differences), the augmenting lag can do a much better job of picking it up. For example, suppose we have

\begin{equation} \begin{array}{l}x_t = x_{t - 1} + u_{xt} \\ y_t = y_{t - 1} - .025\left( {y_{t - 1} - x_{t - 1} } \right) + u_{yt} \\ \end{array} \end{equation}

where the residuals are independent \(N(0,1)\). \(p=1\) is known, so you don’t have the uncertainty about the lag length that you would have in practice. The TY test runs a regression of \(y\) on two lags of \(x\) and \(y\) and tests the exclusion of lag one of \(x\). And with 500 observations, it barely has any power, rejecting non-causality only about 10% of the time using a 5% significance level. By contrast, the standard Granger test (same regression but testing the exclusion of both lags of \(x\)) rejects non-causality 80% of the time. Now that test has a non-standard distribution, but when the distribution is simulated, it isn’t that far off from the usual F in this case, so the 80% rate is roughly correct.

 

Note that not only is this fundamentally flawed, but there have been papers written more recently which have “bootstrapped” the TY test. Since the whole (only) “point” of the TY test is that it has standard asymptotics, bootstrapping it makes no sense at all from a statistical standpoint.

 

See Causality Testing for a more complete discussion.

 

The following is the code used for doing the simulations described here.
 

*
* TYCausalityProblem.rpf
* Example of failure of Toda-Yamamoto causality test
*
compute ndraws=1000
compute nobs=500
compute p=1
*
set gcstats 1 ndraws = 0.0
set tystats 1 ndraws = 0.0
*
do reps=1,ndraws
   set(first=0.0) x 1 nobs = x{1}+%ran(1.0)
   set(first=0.0) y 1 nobs = y{1}-.025*(y{1}-x{1})+%ran(1.0)
   *
   * TY test with p=2 lags. (VAR is actually just 1 lag)
   *
   linreg(noprint) y
   # y{1 to p+1} x{1 to p+1}
   compute ndfty=%ndf
   exclude(noprint)
   # x{1 to p}
   compute tystats(reps)=%cdstat
   *
   * Regular Granger test. (Has non-standard distribution)
   *
   exclude(noprint)
   # x{1 to p+1}
   compute gcstats(reps)=%cdstat
end do reps
*
* Distribution of TY statistics
*
density(grid=automatic,maxgrid=100) tystats / xty fty
spgraph
 scatter(style=line,vmin=0.0,hgrid=%invftest(.05,p,ndfty),$
  footer="Distribution of TY Test Statistics")
 # xty fty
 grtext(x=%invftest(.05,p,ndfty),align=left) "F Critical Value"
spgraph(done)
*
* Distribution of Granger causality statistics
*
density(grid=automatic,maxgrid=100) gcstats / xgc fgc
spgraph
 scatter(style=line,vmin=0.0,hgrid=%invftest(.05,p+1,ndfty),$
  footer="Distribution of Granger Test Statistics")
 # xgc fgc
 grtext(x=%invftest(.05,p+1,ndfty),align=left) "F Critical Value"
spgraph(done)
*
* Evaluation of distribution of GC test using simulations.
* Estimate model under the null.
*
linreg(define=nullxeq) x / ux
# x{1 to p+1}
linreg(define=nullyeq) y / uy
# y{1 to p+1}
vcv(noprint)
# ux uy
group(cv=%sigma) nullmodel nullxeq nullyeq
*
set bootstats 1 ndraws = 0.0
do reps=1,ndraws
   simulate(model=nullmodel,from=p+2,to=nobs,results=nullxy)
   linreg(noprint) nullxy(2)
   # nullxy(2){1 to p+1} nullxy(1){1 to p+1}
   exclude(noprint)
   # nullxy(1){1 to p+1}
   compute bootstats(reps)=%cdstat
end do reps
stats(fract) bootstats
*
sstats(mean) 1 ndraws (gcstats>%fract95)>>gcboot
sstats(mean) 1 ndraws (tystats>%invftest(.05,p,ndfty))>>typrob
disp "Probability of Rejection for TY Test @.05 level" typrob
disp "Probability of Rejection for GC Test Using Simulated .05 level" gcboot
 


Copyright © 2025 Thomas A. Doan