Structural Stability/Constancy Tests

The Hypotheses

The null hypothesis here is one of structural stability: coefficients (and, typically, the variance) are the same over the sample. The alternatives can range from the precise, with a break at a known point in the sample, to the broad, where the coefficients aren't the same, but nothing more specific is assumed. If an estimated model fails to show stable coefficients, inferences using it, or forecasts generated from it, may be suspect. In general, the precise alternatives will generate standard types of tests, while the broad alternatives generate tests with non-standard distributions. And if there is, indeed, a rather sharp break, the more precise alternatives will offer higher power. But if there is a general misspecification error which leads to a more gradual change in the coefficients, a test of a hard break might not fare as well.

Tests with a known change point are known generically as “Chow tests”. There are several forms of this, depending upon the assumptions made. These are usually not very hard to implement (at least in linear models) since they require only estimating the model over two or three samples.

The tests with unknown change points usually require a fair bit of number-crunching, since every possible break in a range of entries must be examined. In most cases, you should rely upon a RATS procedure (such as @STABTEST, for the Hansen stability test) to do these. Some “tests” are actually not formal tests, but rather, are graphical displays of the behavior of some statistic, which are examined informally for evidence of a change in the model. For instance, while the individual tests in a series of sequential “F-tests” might have (approximately) F distributions, the maximum of them won’t. These are useful mainly if the resulting statistics leave little room for doubt that the parameters aren't constant. Even if the asymptotic distribution for the most extreme F isn't known, if the statistic has a nominal p-value of .00001, it’s probably a safe bet that the model has a break.

Tests based upon recursive residuals offer an even more general alternative than the unknown change point. Many tests can be constructed from the recursive residuals because they have the property that, under the null hypothesis, they are independent with constant variance. The behavior of such a process is easy to analyze, so a deviation from that will lead to a rejection of constancy.

Note well that these tests are specification tests for the original model—testing whether the single specification seems to be valid. They do not provide a usable alternative model. A rejection means the original model isn't stable. Why it isn't stable (missing variable? policy shift?) is a separate issue. The location of the “breaks” might be useful in determining the source, but not much more.

Chow Tests

The term “Chow test” is typically applied to the test of structural breaks at known locations. Formally, in the model

\begin{equation} \begin{array}{*{20}c} {y_t = X_t \beta _1 + u_t } & {t \in T_1 } \\ {y_t = X_t \beta _2 + u_t } & {t \in T_2 } \\ \ldots & \ldots \\ {y_t = X_t \beta _n + u_t } & {t \in T_n } \\ \end{array} \end{equation}

the null is $\beta _1 = \beta _2 = {\kern 1pt} {\kern 1pt} {\kern 1pt} \ldots {\kern 1pt} {\kern 1pt} {\kern 1pt} = \beta _n $. The general alternative is that this is not true, though it’s possible for the alternative to allow for some of the coefficients to be fixed across subsamples.

There are two ways to compute the test statistic for this:

1.Run regressions over each of the subsamples and over the full sample. The subsample regressions, in total, are the unrestricted “regression”, and the full sample regression is the restricted regression. Compute a standard F-statistic from the regression summary statistics. With more than two categories, it will probably be simplest to use the SWEEP instruction to do the calculations.

2.Run the regression over the full sample, using dummies times regressors for subsamples 2 to n. Test an exclusion restriction on all the dummies.

The first procedure is usually simpler, especially if there are quite a few regressors. However, it is only applicable if the model is estimated appropriately by ordinary least squares. You must use the second method if

•you need to correct the estimates for heteroscedasticity or autocorrelation (by using the ROBUSTERRORS option), or,

•you are using some form of instrumental variables, or

•you are allowing some coefficients to be constant across subsamples.

Both procedures require that you have enough data points in each partition of the data set to run a regression. An alternative (the Chow Predictive Test) can be used when a subsample (usually at the end of the data set) is too short. In effect, this estimates the model holding back part of the sample, then compares that with the fit when the remainder of the sample is added in.

The example CHOWTEST.RPF uses a cross section data set. There are many fewer tests for stability for those than there are for time series data sets. As a general rule, rejection of stability for a cross section model means that the model was somehow specified wrong—an incorrect functional form, or poor choice of a proxy. With time series data, it’s quite possible that the model simply breaks down part way through the sample, due to changes in laws, technology, etc.

The CONSTANT.RPF example uses time series data. It's a linear model using quarterly data from 1959:1 to 1973:3. The remainder of the section describes the tests used in it.

@STABTEST procedure

@STABTEST implements Bruce Hansen’s (1992) test for general parameter stability, which is a special case of Nyblom’s (1989) stability test. This is based upon the behavior of partial sums of the regression’s normal equations for the parameter and variance. For the full sample, those are zero, and (if the model is stable) the sequence of partial sums shouldn’t stray too far from zero. The procedure generates test statistics for the overall regression (testing the joint constancy of the coefficients and the variance), as well as testing each coefficient and the variance individually. It also supplies approximate p-values. @STABTEST both estimates the linear regression and does the test, so it includes the full specification just like a LINREG:

@stabtest y 1959:1 1973:3

# constant x2 x3

Chow Predictive Test

The Chow predictive test can be used when a subsample is too short to produce a sensible estimate on its own. It’s particularly useful for seeing whether a small “hold-back” sample near the end of the data seems to be consistent with the estimate from the earlier part. In CONSTANT.RPF, the regression is run over the sample through 1971:3, then again through 1973:3. The difference between the sums of squares divided by the number of added data points (8) forms (under the null) an estimate of the variance of the regression that’s independent of the one formed from the first subsample, and thus it generates an F. Note that, in this case, there is enough data to do a separate regression on the second subsample. However, it’s a very short subsample, and the standard Chow test would likely have relatively little power as a result.

linreg(noprint) y 1959:1 1971:3

# constant x2 x3

compute rss1=%rss,ndf1=%ndf

linreg(noprint) y 1959:1 1973:3

# constant x2 x3

compute f=((%rss-rss1)/8)/(rss1/ndf1)

cdf(title="Chow Predictive Test") ftest f 8 ndf1

Tests Based Upon Recursive Residuals

As mentioned in Recursive Least Squares, if the model is in fact, correctly specified with i.i.d. $N(0,\sigma^2)$ errors, then the recursive residuals produced by the RLS instruction are i.i.d. (Normal), while standard regression residuals have at least some in-sample correlation by construction. There are quite a few tests that can be used to test the null that the recursive residuals are i.i.d., and the failure of those tests can be seen as a rejection of the underlying assumptions. The following instruction does the recursive estimation and saves quite a few of the statistics generated:

rls(sehist=sehist,cohist=cohist,sighist=sighist,$

csum=cusum,csquared=cusumsq) y 1959:1 1973:3 rresids

# constant x2 x3

The @CUSUMTESTS procedure does specific tests using the recursive residuals as input: the CUSUM test and the CUSUMQ test (for the square). The CUSUMQ test is mainly aimed at testing stability of the variance.