RREG Instruction

RREG(options) depvar start end resids

# list of explanatory variables in Regression Format

RREG does several types of “robust” estimation procedures for a linear regression. One of these is the LAD or MAD (Least or Minimum Absolute Deviations), the other the generalization of LAD known as quantile regression (Koenker and Bassett, 1978).

RREG can be applied to linear models only. The general design is similar to LINREG.

Wizard

You can use the Statistics—Linear Regressions Wizard to do robust regressions. Pick "Robust Regression" in the Method popup.

Parameters

depvar	dependent variable
start, end	range to estimate, defaults to range permitted by all variables involved in the regression, including instruments if required.
resid	(optional) series for residuals

Options

[PRINT]/NOPRINT

VCV/[NOVCV]

TITLE="title for output" [depends upon options]

These control the printing of regression output and the printing of the estimated Covariance/correlation matrix of the coefficients, and the title used in labeling the output.

SMPL=Standard SMPL option [unused]

Omits from the estimation those observations where the SMPL series or expression is zero.

METHOD=LAD/QUANTILE

QUANTILE=quantile to use for METHOD=QUANTILE [not used]

METHOD chooses between LAD and Quantile estimation methods. If using METHOD=QUANTILE, you can use the QUANTILE option to specify the quantile to use. See the “Technical Information” for details.

ITERS=number of iterations

Allows the user to control the number of iterations used for the linear programming algorithm. The default value depends on the number of parameters, but is generally set to 100.

BANDWIDTH=bandwidth for computing scale factor

XXSCALE=direct value for the scale factor

The BANDWIDTH option allows you to provide the bandwidth to use in computing the scale factor for the covariance matrix. Use XXSCALE if you want to supply the scale factor yourself.

EQUATION=Equation to estimate [unused]

LASTREG/[NOLASTREG]

Use EQUATION to estimated a previously-defined equation. LASTREG re-estimates the most recent regression using the selected robust regression method. If you use either, omit the supplementary card.

DEFINE=equation to define [unused]

FRML=formula to define [unused]

These define an EQUATION and FRML, respectively, using the results of the estimation. You can use the equation/formula for forecasting or other purposes.

WEIGHT=standard WEIGHT option [unused]

Use this option if you want to provide different weights for each observation.

Technical Information

LAD chooses \(\beta\) to minimize

(1) \(\sum {\left| {{y_t} - {{\bf{X}}_t}\beta } \right|} \)

while the quantile regression minimizes

(2) \(\sum\limits_{{y_t} - {{\bf{X}}_t}\beta > 0} {\alpha \left| {{y_t} - {{\bf{X}}_t}\beta } \right|} + \sum\limits_{{y_t} - {{\bf{X}}_t}\beta < 0} {\left( {1 - \alpha } \right)\left| {{y_t} - {{\bf{X}}_t}\beta } \right|} \)

where \(\alpha\) is the quantile requested. LAD is a special case of the quantile regression with \(\alpha=0.5\)—the only difference is that the function value would be half as large.

The functions being minimized aren’t differentiable, so \(\beta\) can’t be estimated using standard "hill-climbing" methods. Instead, a variant of linear programming is used.

For either estimator, the optimum is at the value of \(\beta\) which gives an exact fit for at least K data points, where K is the size of \(\beta\)—a specialized linear programming algorithm is used which identifies the best set of those zeroed data points.

LAD is a direct substitute for least squares. Because it uses the absolute value rather than the square of the residuals, it is less sensitive to extreme values. It will be less efficient than least squares when the residuals are well-behaved (for normal residuals, the efficiency is about 60% that of least squares), but will be more efficient than least squares for fat-tailed residuals.

The quantile regression isn’t centered at the same point as LAD or least squares, so you should not expect estimates to be necessarily similar. The estimator for a single quantile is usually used in combination with other quantile regressions to provide higher efficiency than could be achieved by using LAD alone. Koenker and Bassett, for instance, suggest weighted symmetric (about 0.5) combinations of quantile regressions, such as weights of 1/4, 1/2 and 1/4 on quantiles of 1/4, 1/2 and 3/4.

The covariance matrices are estimated as

(3) \(\eta \times {\left( {{\bf{X'}}{\kern 1pt} {\bf{X}}} \right)^{ - 1}}\)

where \({\left( {{\bf{X'}}{\kern 1pt} {\bf{X}}} \right)^{ - 1}}\) is the standard inverse cross product of the regressors. With f as an estimate of the density function, the scale factor \(eta\) is

(4) \(\frac{{0.25}}{{f{{\left( 0 \right)}^2}}}\) for LAD and

(5) \(\frac{{\alpha \left( {1 - \alpha } \right)}}{{f{{\left( {{x_\alpha }} \right)}^2}}}\) where \(x_\alpha\) is (an) \(\alpha\) quantile of the residuals.

While the parameter estimates are robust to non-normality, the estimates of the covariance matrix are not robust against heteroscedasticity and similar problems. More complex quantile regression techniques exist that can produce robust covariance estimates in these circumstances—these are currently not supported in RREG.

RREG computes f using a Gaussian kernel (see DENSITY), and bandwidth

(6) \(\frac{{0.79{\kern 1pt} {\kern 1pt} IQR}}{{{N^{{1 / 5}}}}}\)

where IQR is the interquartile range, and N is the number of observations. This choice has certain general optimality properties (see discussion in Pagan and Ullah, 1999), but can be too narrow in some circumstances. If you want to override this choice of bandwidth, you can use the BANDWIDTH option. And if you want to choose your own scale factor, use the XXSCALE option.

Example

This is part of an example from Greene (2012). It estimates a Cobb-Douglas production function. There are two very large outliers, so the second LINREG estimates with those omitted. As an alternative to dropping the outliers entirely, the RREG estimates by lad to reduce their effect on the estimates.

linreg logy / resols

# constant logk logl

linreg(smpl=t<>4.and.t<>10) logy

# constant logk logl

rreg logy / reslad

# constant logk logl

rreg(smpl=t<>4.and.t<>10) logy / reslad

# constant logk logl

Variables Defined

%BETA	Coefficient vector (VECTOR)
%XX	Covariance matrix of coefficients, or \({\left( {{\bf{X'}}{\kern 1pt} {\bf{X}}} \right)^{ - 1}}\) (SYMMETRIC)
%TSTATS	Vector containing the t-stats for the coefficients (VECTOR)
%STDERRS	Vector of coefficient standard errors (VECTOR)
%NOBS	Number of observations (INTEGER)
%NREG	Number of regressors (INTEGER)
%NFREE	Number of free parameters (INTEGER)
%NDF	Degrees of freedom (INTEGER)
%FUNCVAL	Final value of the function being minimized (REAL)
%MEAN	Mean of dependent variable (REAL)
%RESIDS	Series containing the residuals (SERIES)
%DURBIN	Durbin-Watson statistic (REAL)
%RHO	First lag correlation coefficient (REAL)
%VARIANCE	Variance of dependent variable (REAL)
%EBW	Bandwidth used in estimate of the density (REAL)