Specification Tests

Specification test is a very general term which really can cover many of other tests. The one absolute requirement for performing a specification test is this:

The model must incorporate more assumptions than are required to estimate its free coefficients.

For instance, if our model, in total, consists of

\begin{equation} y_t = X_t \beta + u_t , E\left( {X_t u_t } \right) = 0 , \beta \text{ unknown} \end{equation}

we have nothing to test, because the least squares residuals will be, by construction, uncorrelated with the X’s. We get specification tests when we go beyond these assumptions:

•If we assume, in addition, that the u’s are homoscedastic or serially uncorrelated, we can test those additional assumptions, because we don’t use them in estimating \(\beta\) by least squares (though we may rely on them for computing the covariance matrix).

•If we replace \(E\left( {X_t u_t } \right) = 0\) with the somewhat stronger assumption of \(E(u_t |X_t ) = 0\), we can test whether various other functions of \(X\) are uncorrelated with the residuals. The RESET test is an example of this.

•If we make the still stronger assumption that \(E\left( {X_s u_t } \right) = 0\) for all \(t,s\) (strict econometric exogeneity), we get, in the distributed lag case, Sims’ exogeneity test, which tests whether or not \(u_t\) is uncorrelated with \(X_{t+j}\).

Two basic strategies are available:

1.Compare estimates of \(\beta\) computed with and without the additional assumptions. If the additional assumptions are true, the estimates should be similar. This gives rise to “Hausman tests” (Hausman, 1978).

2.If the assumptions can be written as orthogonality conditions (\(u_t\) uncorrelated with a set of variables), seeing whether or not these hold in sample is a test of the overidentifying restrictions. The general result for this style of test is given by Hansen (1982).

Hausman Tests

Hausman tests operate by comparing two estimates of \(\beta\), one computed making use of all the assumptions, another using more limited information. If the model is correctly specified, \(\hat \beta _1 - \hat \beta _2 \) should be close to zero. The difficult part of this, in general, is that the covariance matrix of \(\hat \beta _1 - \hat \beta _2 \) is not the covariance matrix of either one alone, it is

\begin{equation} {\rm{Var}}\left( {\hat \beta _1 } \right) + {\rm{Var}}\left( {\hat \beta _2 } \right) - 2\;{\rm{Cov}}\left( {\hat \beta _1 ,\hat \beta _2 } \right) \end{equation}

and computing the covariance term is quite unappealing. However, in Hausman’s (1978) settings, the covariance matrix of the difference simplifies to

\begin{equation} {\rm{Var}}\left( {\hat \beta _1 } \right) - {\rm{Var}}\left( {\hat \beta _2 } \right) \label{eq:HausmanCovMat} \end{equation}

where \(\hat \beta _1 \) is the less efficient estimator. This result is valid only when \(\hat \beta _2 \) is efficient and \(\hat \beta _1 \) is based upon the same or a smaller information set.

Note that it is quite difficult to do a Hausman test properly using the direct comparison of estimators. To get the correct behavior any “nuisance parameters” such as residual variances need to match up in the covariance estimators in \eqref{eq:HausmanCovMat}, which is usually not what you will get by using standard estimators, since each will come up with its own estimator for those. In many cases the Hausman test is identical to a test that is done using an auxiliary regression, and when that’s the case, it’s usually better to use the alternative method.

Procedure

If you would like to perform a Hausman test by the direct comparison of estimators, follow these steps:

1.Estimate the model using one of the two estimators. (The order only matters if the nuisance parameters from one are needed to do the estimates on the other.) Save its coefficient vector (%BETA) and covariance matrix (%XX) into other matrices and save the nuisance parameters if needed later.

2.Estimate the model using the second estimator. Figure out what you have to do to get the comparable estimators for the covariance matrix. Compute the difference between the two properly estimated covariance matrices (less efficient less more efficient).

3.Use TEST with the options ALL and VECTOR=saved coefficients and COVMAT=difference in covariance. This tests for equality between the two estimated coefficient vectors. The covariance matrix of the difference will probably not be full rank. However, TEST will determine the proper degrees of freedom.

To look at a simple example, suppose

\begin{equation} c_t = \alpha _0 + \alpha _1 y_t + u_t \label{eq:HausmanCY} \end{equation}

\begin{equation} E\left( {u_t } \right) = E\left( {c_{t - 1} u_t } \right) = E\left( {y_{t - 1} u_t } \right) = 0 \label{eq:HausmanOrthog} \end{equation}

\begin{equation} E\left( {y_t u_t } \right) = 0 \label{eq:HausmanOLS} \end{equation}

If we take \eqref{eq:HausmanCY} and \eqref{eq:HausmanOrthog} as maintained hypotheses, we can estimate the equation consistently by instrumental variables. If \eqref{eq:HausmanOLS} is also valid, OLS will be efficient. This test is done in the example file HAUSMAN.RPF. When you look at that, pay attention to how many details are required to do this properly: making sure that the estimation range is the same across both estimators, making sure that you use a common variance scale factor, making sure that you use the proper form of test statistic.

The output from the test in that example is:

Hausman Test

## X13. Redundant Restrictions. Using 1 Degrees, not 2

Chi-Squared(1)= 22.111856 with Significance Level 0.00000257

Note that the degrees of freedom are adjusted to 1, even though it is testing the "whole" coefficient vector. Because the CONSTANT is in both the regression and the instrument set, if the slope coefficients are equal the intercepts have to be as well, so the test is really only on the slope. In practice, it’s rare for a Hausman test to have full rank since some of the assumptions tend to be common between the two models.

In many cases (such as this one), the Hausman test can be computed more easily with an auxiliary regression. In this case, it’s the Wu(1973) test, which adds the fitted value from projecting the endogenous explanatory variable onto the instruments to the linear regression and testing the fitted value. And this can be done even more easily using the @RegWuTest procedure immediately after running the instrumental variables regression:

instruments constant realgdp{1} realcons{1}

linreg realcons 1950:2 *

# constant realgdp

@RegWuTest

By far the most common use of the Hausman test is in panel data regressions for testing fixed vs random effects estimators. In the PREGRESS instruction, the Hausman test is built-in, and can be done simply by adding the HAUSMAN option if you are doing random effects.

“Hansen” Tests

By limiting ourselves to a notation appropriate for those models which can be estimated using single-equation methods, we can demonstrate more easily how these tests work. Assume

\begin{equation} y_t = f\left( {X_t ,\beta } \right) + u_t \label{eq:HansenModel} \end{equation}

\begin{equation} E\left( {Z'_t {\kern 1pt} {\kern 1pt} u_t } \right) = 0 \label{eq:HansenConditions} \end{equation}

with some required regularity conditions on differentiability and moments of the processes. \(f\) will just be \(X_t \beta \)

for a linear regression. The \(Z\)´s are the instruments.

If \(Z\) is the same dimension as \(\beta\), the model is just identified, and we can test nothing. Hansen’s key testing result is that

\begin{equation} {\bf{u'}}{\kern 1pt} {\kern 1pt} {\kern 1pt} {\bf{Z}}{\kern 1pt} {\bf{W}}{\kern 1pt} {\bf{Z'}}{\kern 1pt} {\kern 1pt} {\bf{u}}\sim\chi ^2 \label{eq:HansenUZWZUDistrib} \end{equation}

when the weighting matrix \(\bf{W}\) is chosen “optimally.” The degrees of freedom is the difference between the number of orthogonality conditions in \eqref{eq:HansenConditions} and the number of parameters in the \(\beta\) vector.

When you estimate with instrumental variables, this test is automatically included in the output from LINREG, AR1, NLLS, NLSYSTEM and SUR as the J-Specification. This generates output as shown below, with the first line showing the degrees of freedom (the difference between the number of conditions in \eqref{eq:HansenConditions} and the number of estimated coefficients) and the test statistic, and the second line showing the marginal significance level:

J-Specification(4) 7.100744

Significance Level of J 0.13065919

These results would indicate that the null hypothesis (that the overidentifying restrictions are valid) can be accepted.

The computed test statistic can be obtained as %JSTAT, its significance level as %JSIGNIF and the degrees of freedom as %JDF. \({\bf{u'}}{\kern 1pt} {\kern 1pt} {\kern 1pt} {\bf{Z}}{\kern 1pt} {\bf{W}}{\kern 1pt} {\bf{Z'}}{\kern 1pt} {\kern 1pt} {\bf{u}}\) is available as %UZWZU.

If the weighting matrix used in estimation isn’t the optimal choice, there are two ways to “robustify” the test. The first is described in Hansen (1982)—it computes an adjusted covariance matrix \(bf{A}\) for \({\bf{Z'}}{\kern 1pt} {\kern 1pt} {\bf{u}}\) so that \({\bf{u'}}{\kern 1pt} {\kern 1pt} {\kern 1pt} {\bf{Z}}{\kern 1pt} {\bf{A}}^{ - 1} {\kern 1pt} {\bf{Z'}}{\kern 1pt} {\kern 1pt} {\bf{u}}\sim\chi ^2 \). The other is described in Jagannathan and Wang (1996). It uses \({\bf{u'}}{\kern 1pt} {\kern 1pt} {\kern 1pt} {\bf{Z}}{\kern 1pt} {\bf{W}}{\kern 1pt} {\bf{Z'}}{\kern 1pt} {\kern 1pt} {\bf{u}}\) as the test statistic, but they show that it has an asymptotic distribution which is that of a more general quadratic form in independent Normals. You can choose between these with the option JROBUST=STATISTIC (for the first) or JROBUST=DISTRIBUTION (for the second).

See HANSEN.RPF for an example.