Recursive Least Squares |
The instruction RLS does Recursive Least Squares. This is the equivalent of a sequence of least squares regressions, but done in a very efficient way. The main purpose of recursive estimation is to determine if a model is stable. If a model is correctly specified as
\begin{equation} y_t = {\bf{x}}_t \beta + u_t ,{\kern 1pt} {\kern 1pt} u_t {\text{ is i.i.d. }} N\left( {0,\sigma ^2 } \right) \label{eq:linreg_rlsassumptions} \end{equation}
the series of recursive residuals is
\begin{equation} \frac{{\left( {y_t - {\bf{x}}_t \beta _{t - 1} } \right)}}{{\sqrt {1 + {\bf{x}}_t \left( {{\bf{X'}}_{t - 1} {\bf{X}}_{t - 1} } \right)^{ - 1} {\bf{x'}}_t } }} \label{eq:linreg_rresids} \end{equation}
for \(t>K\), where \(K\) is the number of regressors, \({\bf{X}}_{t - 1} \) is the matrix of explanatory variables through period \(t-1\), and \(\beta_{t-1}\) the estimate of \(\beta\) through \(t-1\).
These residuals have the property that they are also i.i.d., while least squares residuals have a finite sample correlation. A failure of the recursive residuals to behave like this type of process will lead to rejection of the base assumptions: either the coefficients aren’t constant through the sample, or the variance isn't constant, or both. See Structural Stability Tests for more on this.
Note that RLS is designed for recursive estimation only. If you need to do something more than that (for instance, you want to generate forecasts at each stage), you can use the instruction KALMAN, which does the same type of sequential estimation, but only updates one entry for each instruction. If you want to allow the (true) coefficients in \eqref{eq:linreg_rlsassumptions} to be time-varying, you need to set up a state-space model (User's Guide, Chapter 10).
RLS has much the same format as LINREG, and the output will largely match that from a LINREG, since the RLS over the full sample will be the same as full sample least squares. Note, however, that RLS will not do instrumental variables estimation.
The residuals produced by RLS using either the built-in %RESIDS series or the resids parameter are the recursive residuals, not the finite sample OLS residuals.
The special options that distinguish RLS from LINREG are:
|
COHISTORY |
saves the sequential estimates of the coefficients in a VEC[SERIES] |
|
SEHISTORY |
saves the sequential estimates of the standard errors of the coefficients in a VEC[SERIES] |
|
SIGHISTORY |
saves the sequential estimates of the standard error of the regression into a series |
|
CSUMS |
saves the cumulated sum of the recursive residuals in a series |
|
CSQUARED |
saves the cumulated sum of squared recursive residuals in a series |
|
DFHISTORY |
saves the regression degrees of freedom in a series |
Example
rls(csquared=cusumsq) c / rresids
# constant y
This does a post-sample predictive test. The cumulative sum of squares is in the series CUSUMSQ. While the numerator could also be written more simply as RRESIDS(1993:1)^2, we write it this way as it will generalize easily to
more than one step.
compute fstat=(cusumsq(1993:1)-cusumsq(1992:1))/$
(cusumsq(1992:1)/(%ndf-1))
cdf(title="Post-Sample Predictive Test") ftest fstat 1 %ndf-1
This does the Harvey-Collier functional misspecification test. This can be done by just computing the sample statistics on the recursive residuals and examining the t-stat and significance level provided therein.
stats rresids 1952:1 1993:1
The ORDER option
By default, RLS generates the estimates in entry sequence. It first works through the initial observations until finding a set which generates a full rank regression.
Entries are then added one by one, and the “history” series are updated. If you think that the more likely alternative to stable coefficients and variance is an instability related to some variable other than the entry sequence, you can use the option ORDER to base the sequence on a different variable. Note that this does not rearrange the data set. Keeping the data set itself in the original order is very important if the model includes lags. If you use ORDER, the entries in any of the series created as described earlier are maintained in their original order. You can use the option INDEX to find out the order in which entries were added to the model. This is a SERIES[INTEGER], defined from the first entry in the regression to the last. If, for instance, you use INDEX=IX for a regression run over 1974:3 through 2017:4, IX(1974:3) will be the smallest entry in the regression range for the order series, while IX(2017:4) will be the largest.
This is an example of the use of RLS with the ORDER option. It does an arranged autoregression test for a TAR (threshold autoregressive) model (see User's Guide, Section 11.5). The RLS does recursive estimation ordered on the first lag of DLR. The recursive residuals are then regressed on the explanatory variables. Under the null of no threshold effect, the coefficients should pass a test for zero values.
diff lr / dlr
set thresh = dlr{1}
rls(order=thresh) dlr / rresids
# constant dlr{1 2}
linreg rresids
# constant dlr{1 2}
exclude(title="Arranged Autoregression Test for TAR")
# constant dlr{1 2}
Copyright © 2025 Thomas A. Doan