Linear Regression: A General Framework

Limitations of the Standard Normal Linear Model

Unfortunately, the assumptions for the Standard Normal Linear Model are unlikely to be met with non-experimental data. And they are particularly likely to fail with time series data. For instance, including a lag of \(y\) among the explanatory variables violates these assumptions. We need models which are appropriate more broadly than this.

Generalized Method of Moments

Much of the modern theory of linear regression can be understood most easily by writing the model’s assumptions in the form:

\begin{equation} y_t = X_t \beta + u_t \label{eq:linreg_basereg} \end{equation}

\begin{equation} \Theta _T \frac{1}{T}\sum {Z_t ^\prime \left( {y_t - X_t \beta } \right) = 0} \label{eq:linreg_ivcondition} \end{equation}

where \(Z_t\) contains the instrumental variables (if any). This is made operational by replacing the expectation by the sample average, thus solving for \(\beta\) the equation

\begin{equation} \Theta _T \frac{1}{T}\sum {Z'_t \left( {y_t - X_t \beta } \right) = 0} \label{eq:linreg_gmmcondition} \end{equation}

where \(\Theta _T \) is a weighting matrix, which comes into play if the dimensions of \(Z\) are bigger than those of \(X\). This is known as the Generalized Method of Moments (GMM for short). The unified theory of these estimators (covering non-linear models as well) was developed originally in Hansen (1982). “Generalized,” by the way, refers to the presence of the weighting matrix.

Assumption \eqref{eq:linreg_ivcondition} sometimes comes directly from assumptions about the variables, or it is sometimes derived as the first order conditions of an optimization problem. For instance, least squares minimizes

\begin{equation} \sum {\left( {y_t - X_t \beta } \right)^2 } \end{equation}

which has as its first order necessary conditions

\begin{equation} - 2{\kern 1pt} \sum {X'_t \left( {y_t - X_t \beta } \right)} = 0 \end{equation}

Apart from constant multipliers, this is the same condition as would be derived from equation \eqref{eq:linreg_basereg} plus the assumption

\begin{equation} EX_t ^\prime u_t = 0 \label{eq:linreg_olsortho} \end{equation}

This is a relatively weak assumption, as it assumes nothing about the distribution of the residuals, merely requiring that the residuals be uncorrelated with the regressors. Under \eqref{eq:linreg_basereg} plus \eqref{eq:linreg_olsortho}, plus some regularity conditions which ensure good behavior of the sample averages, least squares gives consistent estimates of \(\beta\).

Consistent, however, doesn’t mean that the output from a LINREG instruction is correct. The standard errors, t–statistics and covariance matrix of coefficients (and thus any hypothesis tests which follow) are based upon stronger assumptions than merely \eqref{eq:linreg_olsortho}. In particular, those calculations assume that \(u\) is homoscedastic (constant variance) and serially uncorrelated.

It’s possible to reformulate the model to deal with heteroscedasticity or serial correlation with specific forms. However, if the precise form of this isn’t known, applying a generalized least squares (GLS) technique may, in fact, make matters worse. Plus, particularly in the case of serial correlation, the standard GLS method may give inconsistent estimates, as it requires much stronger assumptions regarding the relationship between \(X\) and \(u\).

ROBUSTERRORS

An alternative is to apply the simpler least squares technique for estimation, and then correct the covariance matrix estimate to allow for more complex behavior of the residuals. In RATS, this is controlled by the ROBUSTERRORS option, which is available on most of the regression instructions. The following is a (somewhat) technical description of this process: if you’re interested in the actual conditions required for this, consult a graduate level text like Hamilton(1994) or Greene (2012).

If \(\beta_0\) is the true set of coefficients and \(\beta_T\) is the solution of \eqref{eq:linreg_gmmcondition}, then (assuming the estimates are consistent and \(T\) is big enough), a first order Taylor expansion of \eqref{eq:linreg_gmmcondition} gives

\begin{equation} \Theta _T \frac{1}{T}\sum {Z_t ^\prime X_t (\beta _T - \beta _0 ) \approx \,} \Theta _T \frac{1}{T}\sum {Z_t ^\prime (y_t - X_t \beta _0 )} \end{equation}

which can be rewritten:

\begin{equation} \sqrt T (\beta _T - \beta _0 ) \approx \left\{ {\Theta _T \frac{1}{T}\sum {Z_t ^\prime X_t } } \right\}^{ - 1} \left\{ {\Theta _T \frac{1}{{\sqrt T }}\sum {Z_t ^\prime (y_t - X_t \beta _0 )} } \right\} \label{eq:linreg_gmmasymptotics} \end{equation}

Assume that \({\Theta _T }\) converges to a constant (full rank) matrix \({\Theta _0 }\) (it’s just the identity matrix if we’re doing least squares). Assuming that \(X\) and \(Z\) are fairly well-behaved, it doesn’t seem to be too much of a stretch to assume (by a Law of Large Numbers) that the first factor is converging in probability to a constant matrix which can be estimated consistently by (the inverse of)

\begin{equation} {\bf{A}}_T = \Theta _T \frac{1}{T}\sum {Z_t ^\prime X_t } \end{equation}

The second factor is \({1 \mathord{\left/{\vphantom {1 {\sqrt T }}} \right.} {\sqrt T }}\) times a sum of objects with expected value 0. Under the correct assumptions, some Central Limit Theorem will apply, giving this term an asymptotically Normal distribution. The tricky part about the second term is that the summands, while they are assumed to be mean zero, aren’t assumed to be independent or identically distributed. Under the proper conditions, this second term is asymptotically Normal with mean vector zero and covariance matrix which can be estimated consistently by:

\begin{equation} {\bf{B}}_T = \Theta _T \frac{1}{T}\left\{ {\sum\limits_{k = - L}^L {\sum\limits_t {{\kern 1pt} Z_t ^\prime u_t {\kern 1pt} u_{t - k} } } Z_{t - k} {\kern 1pt} } \right\}\Theta _T ^\prime \label{eq:linreg_zmcov} \end{equation}

If there is no serial correlation in the \(u\)’s (or, more accurately in the \(Zu\)’s), \(L\) is just zero. The choice of \(L\) is governed by the option LAGS=correlated lags. We will call the term in braces in \eqref{eq:linreg_zmcov} mcov(Z,u)—short for matrix covariogram. MCOV is the name of the RATS instruction which can compute it, although the calculations usually are handled automatically within another instruction. The end result of this is that (in large samples) we have the following approximation for the distribution of the estimator:

\begin{equation} \sqrt T (\beta _T - \beta _0 )\sim N\left( {0,{\bf{A}}_T^{ - 1} {\bf{B}}_T {\bf{A'}}_T^{ - 1} } \right) \end{equation}

This general form for a covariance matrix occurs in many contexts in statistics and has been dubbed a “sandwich estimator.”

Newey–West

There is one rather serious complication with this calculation: when \(L\) is non-zero, there is no guarantee that the matrix \(\bf{B}_T\) will be positive definite. As a result, hypothesis tests may fail to execute, and standard errors may show as zeros. (Very) technically, this can happen because the formula estimates the spectral density of \(Z_t ^\prime u_t \) at 0 with a Dirichlet lag window, which corresponds to a spectral window that is negative for some frequencies (see Koopmans, 1974, p. 306).

RATS provides a broader class of windows as a way of working around this problem. While the default is the window as shown above (which has its uses, particularly in panel data), other windows can be chosen using the LWINDOW option. Newey and West (1987) prove several results when the \(k\) term in \eqref{eq:linreg_zmcov} is multiplied by

\begin{equation} 1 - \frac{{|k|}}{{L + 1}} \end{equation}

which is known as the Bartlett lag window. As a result, the covariance matrix computed using this is known in econometrics as Newey-West.

The options ROBUSTERRORS, LAGS=lags and LWINDOW=NEWEYWEST will give you the Newey-West covariance matrix. See Robust Error Calculations for descriptions of the other choices for LWINDOW.

HC1 Option

For models estimated by least squares, the HC1 option alters the calculation to correct for degrees of freedom in \(u\). This was the simplest of the small-sample corrections proposed in MacKinnon and White(1985). The covariance matrix computed above will be multiplied by \(T/(T - K)\), which is the same degrees of freedom adjustment typically used when using a more basic estimator.