Serial Correlation Estimation

Serial correlation (correlation between errors at different time periods) is a more serious problem than heteroscedasticity. There are two basic ways to treat this: estimate by Generalized Least Squares (GLS) taking into account a specific form of serial correlation, or estimate by simple least squares, then compute a covariance matrix which is robust to the serial correlation. See Serial Correlation Testing for information on testing a regression for serial correlated errors.

For GLS, RATS offers only one “packaged” form of serial correlation correction, which is for first order. There is a well-developed (though now rather old) literature on estimation with first-order autoregressive (AR1) errors. More complicated error structures can be handled using filtered least squares (see the FILTER instruction), non-linear least squares, or a RegARIMA model. Note, however, that statistical practice is moving away from tacking the dynamics onto a static model through the error term towards models which are designed to produce serially uncorrelated errors by incorporating the dynamics directly using lagged variables, such as with VAR’s and ARDL models.

The AR1 Instruction

AR1 computes regressions with correction for first order autocorrelated errors by estimating a model of the form:

\begin{equation} y_t = X_t \beta + u_t \,\,\,,\,\,\,\,u_t = \rho u_{t - 1} + \varepsilon _t \label{eq:linreg_ar1model} \end{equation}

Its syntax is very similar to LINREG, except that it has some options for choosing the estimation method for the serial correlation coefficient \(\rho\). You can also input a specific value for \(\rho\).

Example AR1.RPF demonstrates several ways of handling serially correlated errors in a regression of \(Y\) on \(X\). One very simple possibility is to run a “first difference” regression, which would be correct if \(\rho=1\), and will be reasonable if \(\rho\) is close to that. That can be done most easily with

ar1(rho=1.0) y

# constant x

Note, by the way, that this zeroes out the CONSTANT, which will show a zero coefficient and zero standard errors.

If \(\rho\) needs to be estimated, you have a choice between two optimization criteria, and within each, two ways of computing r (iterative or search):

•Simple least squares, skipping the first observation (METHOD=CORC and METHOD=HILU, Cochrane–Orcutt and Hildreth–Lu respectively).

•Maximum likelihood (METHOD=MAXL and METHOD=SEARCH).

For completeness, RATS also offers Prais-Winsten, which is a full-sample GLS estimator, with METHOD=PW.

The iterative methods (CORC and MAXL) are much faster; however, they don’t guarantee that you have found the global optimum. CORC, in particular, can converge to the wrong root if you have lagged dependent variables. As, with modern computers, speed is no longer much of an issue in these models, the default estimation procedure is the slower but steadier HILU.

In large samples, there should be only a slight difference between the methods, unless the results show multiple optima. However, in small samples, there can be a substantial difference between maximum likelihood and the least squares estimators. Maximum likelihood includes an extra data point, which can have a major impact when there aren’t many data points to start. Maximum likelihood steers the estimates of \(\rho\) away from the boundaries at plus and minus one, with the difference becoming more noticeable as \(\rho\) approaches those boundary values.

AR1.RPF uses three of these, HILU, CORC and MAXL.

ar1(method=hilu) y

# constant x

ar1(method=corc) y

# constant x

ar1(method=maxl) y

# constant x

Serial Correlation: The ROBUSTERRORS Option

On LINREG, NLLS, NLSYSTEM, SUR, GARCH, LDV, DDV and MAXIMIZE, you can use the ROBUSTERRORS option, combined with the LAGS=correlated lags option, to compute an estimate of the covariance matrix allowing for serial correlation up to a moving average of order correlated lags. This is sometimes known as the HAC (Heteroscedasticity and Autocorrelation Consistent) covariance matrix.

ROBUSTERRORS is important in situations where:

•You do not know the form of serial correlation, so a particular generalized least squares (GLS) procedure such as AR1 may be incorrect, or

•GLS is inconsistent because the regressors (or instruments) are correlated with past residuals. Brown and Maital (1981) and Hayashi and Sims (1983) are two of the earliest examples of the proper analysis of such settings.

In some situations, the proper value of correlated lags is known from theory. For instance, errors in six-step-ahead forecasts will be a moving average process of order five. If not known, it has to be set to catch most of the serial correlation.

In general, it’s a good idea to also include the LWINDOW=NEWEYWEST to get the Newey-West covariance matrix, or one of the other non-truncated windows. Otherwise the covariance matrix produced can be defective in a way that isn’t noticed until you try to use it to do hypothesis tests. Note that in the Newey-West window, if LAGS=1, the contribution of the lag term is cut in half. In a situation like this, even if the correlations are known to be zero after the first lag, you might be better off adding some extra lags just to make sure the Newey-West window doesn’t cut off most of the attempted correction.

An example where AR1 would be a mistake is seen in Section 6.8 of Hayashi (2000) which analyzes the relationship between spot and forward exchange rates. A market efficiency argument implies that the regression

\(A_t = \alpha + \beta {\kern 1pt} P_t + u_t \), \(A_t\)=actual depreciation and \(P_t\) =predicted depreciation

should have coefficients satisfying \(\alpha = 0,\beta = 1\). However, because these are weekly data of 30-day forward rates, the residuals will almost certainly be serially correlated, up to four moving average lags. However, an AR1 “correction” won’t work. AR1, in effect, runs OLS on

\begin{equation} A_t - \rho A_{t - 1} = \alpha (1 - \rho ) + \beta (P_t - \rho P_{t - 1} ) + (u_t - \rho u_{t - 1} ) \end{equation}

The transformed residual \((u_t - \rho u_{t - 1} )\) can’t be assumed to be uncorrelated with the transformed regressor \((P_t - \rho P_{t - 1} )\), since the information which led to the surprise \(u_{t - 1} \) will get incorporated into \(P_t \). Instead, the equation is estimated by OLS, with standard errors corrected for the serial correlation:

linreg(robusterrors,lags=4,lwindow=neweywest) s30_s

# constant f_s

AR1.RPF includes a similar treatment:

linreg(robust,lwindow=neweywest,lags=4) y

# constant x