RREG Instruction |
RREG(options) depvar start end resids
# list of explanatory variables in Regression Format
RREG does several types of “robust” estimation procedures for a linear regression. One of these is the LAD or MAD (Least or Minimum Absolute Deviations), the other the generalization of LAD known as quantile regression (Koenker and Bassett, 1978).
RREG can be applied to linear models only. The general design is similar to LINREG.
Wizard
You can use the Statistics—Linear Regressions Wizard to do robust regressions. Pick "Robust Regression" in the Method popup.
Parameters
|
depvar |
dependent variable |
|
start, end |
range to estimate, defaults to range permitted by all variables involved in the regression, including instruments if required. |
|
resid |
(optional) series for residuals |
Options
[PRINT]/NOPRINT
VCV/[NOVCV]
TITLE="title for output" [depends upon options]
These control the printing of regression output and the printing of the estimated Covariance/correlation matrix of the coefficients, and the title used in labeling the output.
SMPL=Standard SMPL option [unused]
Omits from the estimation those observations where the SMPL series or expression is zero.
METHOD=LAD/QUANTILE
QUANTILE=quantile to use for METHOD=QUANTILE [not used]
METHOD chooses between LAD and Quantile estimation methods. If using METHOD=QUANTILE, you can use the QUANTILE option to specify the quantile to use. See the “Technical Information” for details.
ITERS=number of iterations
Allows the user to control the number of iterations used for the linear programming algorithm. The default value depends on the number of parameters, but is generally set to 100.
BANDWIDTH=bandwidth for computing scale factor
XXSCALE=direct value for the scale factor
The BANDWIDTH option allows you to provide the bandwidth to use in computing the scale factor for the covariance matrix. Use XXSCALE if you want to supply the scale factor yourself.
EQUATION=Equation to estimate [unused]
LASTREG/[NOLASTREG]
Use EQUATION to estimated a previously-defined equation. LASTREG re-estimates the most recent regression using the selected robust regression method. If you use either, omit the supplementary card.
DEFINE=equation to define [unused]
FRML=formula to define [unused]
These define an EQUATION and FRML, respectively, using the results of the estimation. You can use the equation/formula for forecasting or other purposes.
WEIGHT=standard WEIGHT option [unused]
Use this option if you want to provide different weights for each observation.
Technical Information
LAD chooses \(\beta\) to minimize
(1) \(\sum {\left| {{y_t} - {{\bf{X}}_t}\beta } \right|} \)
while the quantile regression minimizes
(2) \(\sum\limits_{{y_t} - {{\bf{X}}_t}\beta > 0} {\alpha \left| {{y_t} - {{\bf{X}}_t}\beta } \right|} + \sum\limits_{{y_t} - {{\bf{X}}_t}\beta < 0} {\left( {1 - \alpha } \right)\left| {{y_t} - {{\bf{X}}_t}\beta } \right|} \)
where \(\alpha\) is the quantile requested. LAD is a special case of the quantile regression with \(\alpha=0.5\)—the only difference is that the function value would be half as large.
The functions being minimized aren’t differentiable, so \(\beta\) can’t be estimated using standard "hill-climbing" methods. Instead, a variant of linear programming is used.
For either estimator, the optimum is at the value of \(\beta\) which gives an exact fit for at least K data points, where K is the size of \(\beta\)—a specialized linear programming algorithm is used which identifies the best set of those zeroed data points.
LAD is a direct substitute for least squares. Because it uses the absolute value rather than the square of the residuals, it is less sensitive to extreme values. It will be less efficient than least squares when the residuals are well-behaved (for normal residuals, the efficiency is about 60% that of least squares), but will be more efficient than least squares for fat-tailed residuals.
The quantile regression isn’t centered at the same point as LAD or least squares, so you should not expect estimates to be necessarily similar. The estimator for a single quantile is usually used in combination with other quantile regressions to provide higher efficiency than could be achieved by using LAD alone. Koenker and Bassett, for instance, suggest weighted symmetric (about 0.5) combinations of quantile regressions, such as weights of 1/4, 1/2 and 1/4 on quantiles of 1/4, 1/2 and 3/4.
The covariance matrices are estimated as
(3) \(\eta \times {\left( {{\bf{X'}}{\kern 1pt} {\bf{X}}} \right)^{ - 1}}\)
where \({\left( {{\bf{X'}}{\kern 1pt} {\bf{X}}} \right)^{ - 1}}\) is the standard inverse cross product of the regressors. With f as an estimate of the density function, the scale factor \(eta\) is
(4) \(\frac{{0.25}}{{f{{\left( 0 \right)}^2}}}\) for LAD and
(5) \(\frac{{\alpha \left( {1 - \alpha } \right)}}{{f{{\left( {{x_\alpha }} \right)}^2}}}\) where \(x_\alpha\) is (an) \(\alpha\) quantile of the residuals.
While the parameter estimates are robust to non-normality, the estimates of the covariance matrix are not robust against heteroscedasticity and similar problems. More complex quantile regression techniques exist that can produce robust covariance estimates in these circumstances—these are currently not supported in RREG.
RREG computes f using a Gaussian kernel (see DENSITY), and bandwidth
(6) \(\frac{{0.79{\kern 1pt} {\kern 1pt} IQR}}{{{N^{{1 / 5}}}}}\)
where IQR is the interquartile range, and N is the number of observations. This choice has certain general optimality properties (see discussion in Pagan and Ullah, 1999), but can be too narrow in some circumstances. If you want to override this choice of bandwidth, you can use the BANDWIDTH option. And if you want to choose your own scale factor, use the XXSCALE option.
Example
This is part of an example from Greene (2012). It estimates a Cobb-Douglas production function. There are two very large outliers, so the second LINREG estimates with those omitted. As an alternative to dropping the outliers entirely, the RREG estimates by lad to reduce their effect on the estimates.
linreg logy / resols
# constant logk logl
linreg(smpl=t<>4.and.t<>10) logy
# constant logk logl
rreg logy / reslad
# constant logk logl
rreg(smpl=t<>4.and.t<>10) logy / reslad
# constant logk logl
Variables Defined
|
%BETA |
Coefficient vector (VECTOR) |
|
%XX |
Covariance matrix of coefficients, or \({\left( {{\bf{X'}}{\kern 1pt} {\bf{X}}} \right)^{ - 1}}\) (SYMMETRIC) |
|
%TSTATS |
Vector containing the t-stats for the coefficients (VECTOR) |
|
%STDERRS |
Vector of coefficient standard errors (VECTOR) |
|
%NOBS |
Number of observations (INTEGER) |
|
%NREG |
Number of regressors (INTEGER) |
|
%NFREE |
Number of free parameters (INTEGER) |
|
%NDF |
Degrees of freedom (INTEGER) |
|
%FUNCVAL |
Final value of the function being minimized (REAL) |
|
%MEAN |
Mean of dependent variable (REAL) |
|
%RESIDS |
Series containing the residuals (SERIES) |
|
%DURBIN |
Durbin-Watson statistic (REAL) |
|
%RHO |
First lag correlation coefficient (REAL) |
|
%VARIANCE |
Variance of dependent variable (REAL) |
|
%EBW |
Bandwidth used in estimate of the density (REAL) |
Copyright © 2025 Thomas A. Doan