Examples / ROBUST.RPF |
ROBUST.RPF is an example of robust estimation of a linear model. It is adapted from Greene(2012), Example 7.9.
This first estimates a regression (a production function of log output on log capital and log labor input) by least squares, computes the standardized (more precisely the internally studentized—PRJ with the XVX option is used for adjusting the variance of in-sample fitted values) residuals and reruns the regression dropping from the sample any data point where the standardized residual is greater than 2.5 in absolute value (which ends up removing two observations).
linreg logy / resids
# constant logk logl
prj(xvx=px)
set stdresids = resids/sqrt(%seesq*(1-px))
*
linreg(smpl=abs(stdresids)<=2.5) logy
# constant logk logl
The second estimator is to use RREG to do Least Absolute Deviations (LAD), which is the default for RREG.
rreg logy
# constant logk logl
which here gives results quite similar to the outlier-adjusted LINREG, though with somewhat higher standard errors.
Some alternative robust estimators can be generated which have behavior in the tails similar to LAD, but near zero are more like least squares. For instance, the following objective function:
\begin{equation} \sum\limits_t {\frac{{u_t^2 }}{{(c^2 + u_t^2 )^{1/2} }}} \end{equation}
will look like the absolute value in the tails, when \(u\) is much larger than the constant \(c\), but will look like the square when \(c\) is larger than \(u\) and thus the constant dominates the denominator. By keeping the minimand differentiable, this can be estimated more simply: either by iterated weighted least squares (IWLS) or by a standard nonlinear estimation routine.
IWLS starts with the least squares estimator, then computes the denominator given the least squares residuals. This is used as the SPREAD option in a LINREG. Because LINREG with SPREAD produces unweighted residuals (that is, uses the original data and regressors, not weighted ones), the spread series can be recomputed with the adjusted estimates, and this can be repeated until convergence (measured with the %TESTDIFF function applied to the new estimates compared with the previous ones). To simplify the programming, the least squares estimates are actually done using a SPREAD series which is constant at 1.0.
The C value needs to be something on the order of the standard deviation to give the right behavior. (Too small and you basically get LAD; too large and you effectively get least squares). So this chooses it as the standard error of the least squares regression.
compute c = sqrt(%seesq)
dec vector beta0
*
* Start with least squares (spread = constant). Do 10 iterations or until
* convergence.
*
set spread = 1.0
do iters=1,10
compute beta0=%beta
set spread = sqrt(c^2+resids^2)
linreg(noprint,spread=spread) logy / resids
# constant logk logl
if %testdiff(beta0,%beta)<.0001
break
end do iters
?"Iterations Taken" iters
The corrected covariance matrix is obtained by noting that the IWLS estimator has a first order condition of
\begin{equation} \sum {X_t } \frac{{u_t }}{{\sqrt {c^2 + u_t^2 } }} = 0 \end{equation}
The \({f'}\) of the \(f(u)\) function in that can be verified to be
\begin{equation} \frac{{c^2 }}{{\left( {c^2 + u_t^2 } \right)^{3/2} }} \end{equation}
So the covariance matrix is computed and the revised regression displayed with:
set f = resids/spread
set fprime = c^2/spread^3
mcov(matrix=b,lastreg) / f
mcov(matrix=a,lastreg,nosquare) / fprime
linreg(create,lastreg,form=chisquared,covmat=%mqform(b,inv(a)),$
title="Iterated Weighted Least Squares")
In this case, all four estimators (least squared, trimmed least squares, LAD and M) give fairly similar point estimates and generally very similar standard errors, other than trimmed least squares.
Full Program
open data zellner.prn
data(format=prn,org=columns) 1 25 valueadd capital labor nfirm
*
set logy = log(valueadd)
set logk = log(capital)
set logl = log(labor)
*
* Compute linear regression, and the standardized residuals.
*
linreg logy / resids
# constant logk logl
prj(xvx=px)
set stdresids = resids/sqrt(%seesq*(1-px))
*
* Rerun the regression, omitting the outliers (defined here as
* observations with standardized residuals greater than 2.5 in absolute
* value.
*
linreg(smpl=abs(stdresids)<=2.5) logy
# constant logk logl
*
* Now do LAD estimator.
*
rreg logy
# constant logk logl
*
* Iterated WLS M-estimator
*
compute c = sqrt(%seesq)
dec vector beta0
*
* Start with least squares (spread = constant). Do 10 iterations or until
* convergence.
*
set spread = 1.0
do iters=1,10
compute beta0=%beta
set spread = sqrt(c^2+resids^2)
linreg(noprint,spread=spread) logy / resids
# constant logk logl
if %testdiff(beta0,%beta)<.0001
break
end do iters
?"Iterations Taken" iters
*
* Compute the sandwich estimator for the covariance matrix.
*
set f = resids/spread
set fprime = c^2/spread^3
mcov(matrix=b,lastreg) / f
mcov(matrix=a,lastreg,nosquare) / fprime
linreg(create,lastreg,form=chisquared,covmat=%mqform(b,inv(a)),$
title="Iterated Weighted Least Squares")
Output
Linear Regression - Estimation by Least Squares
Dependent Variable LOGY
Usable Observations 25
Degrees of Freedom 22
Centered R^2 0.9730750
R-Bar^2 0.9706273
Uncentered R^2 0.9986265
Mean of Dependent Variable 5.8120922044
Std Error of Dependent Variable 1.3753035142
Standard Error of Estimate 0.2357058986
Sum of Squared Residuals 1.2222599535
Regression F(2,22) 397.5427
Significance Level of F 0.0000000
Log Likelihood 2.2537
Durbin-Watson Statistic 1.9575
Variable Coeff Std Error T-Stat Signif
************************************************************************************
1. Constant 1.8444157136 0.2335928490 7.89586 0.00000007
2. LOGK 0.2454280713 0.1068574320 2.29678 0.03152246
3. LOGL 0.8051829551 0.1263336077 6.37347 0.00000206
Linear Regression - Estimation by Least Squares
Dependent Variable LOGY
Usable Observations 23
Degrees of Freedom 20
Skipped/Missing (from 25) 2
Centered R^2 0.9888884
R-Bar^2 0.9877772
Uncentered R^2 0.9994653
Mean of Dependent Variable 5.9323384337
Std Error of Dependent Variable 1.3638215667
Standard Error of Estimate 0.1507796992
Sum of Squared Residuals 0.4546903536
Regression F(2,20) 889.9576
Significance Level of F 0.0000000
Log Likelihood 12.4862
Durbin-Watson Statistic 1.8668
Variable Coeff Std Error T-Stat Signif
************************************************************************************
1. Constant 1.7642052396 0.1637986013 10.77058 0.00000000
2. LOGK 0.2094788550 0.0700712951 2.98951 0.00724489
3. LOGL 0.8519478678 0.0847732478 10.04973 0.00000000
Linear Model - Estimation by Least Absolute Deviations
Convergence in 4 Iterations. Final criterion was -0.0195988 <= 0.0000000
Dependent Variable LOGY
Usable Observations 25
Degrees of Freedom 22
Function Value 3.92268583
Durbin-Watson Statistic 1.9396
Variable Coeff Std Error T-Stat Signif
***************************************************************************************
1. Constant 1.8064184130 0.2353437723 7.67566 0.00000012
2. LOGK 0.2048726092 0.1076583948 1.90299 0.07020985
3. LOGL 0.8494661424 0.1272805564 6.67397 0.00000104
Iterations Taken 7
Linear Regression - Estimation by Iterated Weighted Least Squares
Dependent Variable LOGY
Usable Observations 25
Degrees of Freedom 22
Mean of Dependent Variable 5.8120922044
Std Error of Dependent Variable 1.3753035142
Standard Error of Estimate 0.2359127900
Sum of Squared Residuals 1.2244065787
Durbin-Watson Statistic 1.9410
Variable Coeff Std Error T-Stat Signif
************************************************************************************
1. Constant 1.8059654814 0.2226559755 8.11101 0.00000000
2. LOGK 0.2299975476 0.0999027810 2.30221 0.02132313
3. LOGL 0.8269030465 0.1229271312 6.72677 0.00000000
Copyright © 2025 Thomas A. Doan