Examples / LISTEXAMPLE.RPF |
LISTEXAMPLE.RPF does a sequence of regressions, removing from the sample each time just the most extreme outlier, so long as there is a data point with a standardized residual greater than 2.5. The LIST[INTEGER] called OUTLIERS is used to keep track of the entries as they are removed. This is provided as an example of how to use a LIST, which is a vector of objects of variable length.
Outside the loop, this creates a series which initially is all 1's (which will be the SMPL series for the regression, with dropped entries zeroed out) and declares an (empty) LIST[INTEGER] which will be used to keep track of the observations which are dropped.
set working = 1.0
dec list[integer] outliers
Inside the loop
prj(xvx=px)
set stdresids = abs(resids/sqrt(%seesq*(1-px)))
ext(noprint) stdresids
computes the series of (absolute values of) the studentized residuals and uses EXTREMUM to locate the maximum value (and entry). Then this checks to see if any of the studentized residuals are greater than 2.5. If the largest one is above that limit, the first COMPUTE adds the entry number (%MAXENT) to the end of the outliers LIST. (That's what a + operation does on a LIST). The second COMPUTE zeroes out the entry in the series used in the SMPL option on the LINREG so it gets skipped in the next and later regressions. If none of the residual statistics is greater than the 2.5 limit, we're done.
if %maximum>=2.5 {
compute outliers=outliers+%maxent
compute working(%maxent)=0
}
else
break
This shows the list of outliers (%size returns the number of elements, which could be zero) and re-runs the final regression with the PRINT option on.
do i=1,%size(outliers)
?"Outlier at" outliers(i)
end do i
linreg(print,smpl=working) logy / resids
# constant logk logl
Full Program
open data zellner.prn
data(format=prn,org=columns) 1 25 valueadd capital labor nfirm
*
set logy = log(valueadd)
set logk = log(capital)
set logl = log(labor)
*
* Compute linear regression, and the standardized residuals.
*
set working = 1.0
dec list[integer] outliers
loop
linreg(noprint,smpl=working) logy / resids
# constant logk logl
prj(xvx=px)
set stdresids = abs(resids/sqrt(%seesq*(1-px)))
ext(noprint) stdresids
if %maximum>=2.5 {
compute outliers=outliers+%maxent
compute working(%maxent)=0
}
else
break
end loop
*
do i=1,%size(outliers)
?"Outlier at" outliers(i)
end do i
linreg(print,smpl=working) logy
# constant logk logl
Output
Outlier at 4
Outlier at 10
Linear Regression - Estimation by Least Squares
Dependent Variable LOGY
Usable Observations 23
Degrees of Freedom 20
Skipped/Missing (from 25) 2
Centered R^2 0.9888884
R-Bar^2 0.9877772
Uncentered R^2 0.9994653
Mean of Dependent Variable 5.9323384337
Std Error of Dependent Variable 1.3638215667
Standard Error of Estimate 0.1507796992
Sum of Squared Residuals 0.4546903536
Regression F(2,20) 889.9576
Significance Level of F 0.0000000
Log Likelihood 12.4862
Durbin-Watson Statistic 1.8668
Variable Coeff Std Error T-Stat Signif
************************************************************************************
1. Constant 1.7642052396 0.1637986013 10.77058 0.00000000
2. LOGK 0.2094788550 0.0700712951 2.98951 0.00724489
3. LOGL 0.8519478678 0.0847732478 10.04973 0.00000000
Copyright © 2025 Thomas A. Doan