Examples / HETERO.RPF |
HETERO.RPF is an example of estimation of linear model with heteroscedasticity. It's adapted from Hill, Griffiths, & Lim(2008).
It analyzes a linear regression of food expenditures on income for a cross section of 40 individuals. (Income is in units of $100, while food expenditures are in $1, so the slope coefficient would be interpreted as the percentage of income spent on food). One would expect that not only does the mean of spending go up with income, but also the variance of the residuals. That the variance also increases with income is fairly clear from looking at a scatter plot, where the LINES option on SCATTER is used to put the regression line on the x-y scatter:
linreg food
# constant income
scatter(style=dots,lines=%beta,vmin=0.0,$
hlabel="x = weekly income in $100",$
vlabel="y = weekly food expenditures in $")
# income food
The form of the variance is less clear. If we assume it’s directly proportional to income, we compute the weighted least squares estimator by
linreg(spread=income) food
# constant income
The next set of estimates redoes the least squares, but uses ROBUSTERRORS to correct the covariance matrix for heteroscedasticity.
linreg(robust) food
# constant income
If you know the form of heteroscedasticity, this will not be as efficient as weighted least squares—the advantage is that you do not need to know the form. It’s possible to combine both of these ideas: for instance, if you think that the variance is related to income, but are unsure of the form of that relationship, you might estimate using a “best guess” form for the SPREAD option, then include ROBUSTERRORS to correct for any remaining problems:
linreg(spread=income,robust) food
# constant income
To do Feasible GLS, you have to pick a functional form for the spread function. log V on log income means the variance itself will be a power of income. You start with a basic LINREG to get the residuals, then run a regression of the log squared residuals on log income:
linreg food
# constant income
*
set esq = log(%resids^2)
set z = log(income)
*
linreg esq
# constant z
If SPREAD=INCOME used above were correct, we would expect the slope coefficient to be near 1. Instead, it's 2.33, suggesting that a better model would have the standard deviation, rather than variance, proportional to income.
The fitted values from the auxiliary regression can be computed with PRJ. Those are then "exp'ed" to give the estimated variance. You can do that directly on the SPREAD option, which accepts a formula.
prj vhat
linreg(spread=exp(vhat)) food
# constant income
The slope estimates actually don’t change much with the feasible GLS compared with OLS, but the standard error drops from 2.09 for least squares with White standard errors to .97 for GLS.
Full Program
open data food.dat
data(format=free,org=columns) 1 40 food income
*
linreg food
# constant income
*
* The LINES option on SCATTER allows you to add one or more y=a+bx lines
* to a scatter plot. In this case, it takes the 2-vector with the
* intercept and slope from the previous LINREG.
*
scatter(style=dots,lines=%beta,vmin=0.0,$
hlabel="x = weekly income in $100",$
vlabel="y = weekly food expenditures in $")
# income food
*
* LINREG with the SPREAD option does weighted least squares with a
* variance series proportional to the series given by the SPREAD option.
*
linreg(spread=income) food
# constant income
@regconfidence(conf=.95)
*
* LINREG with ROBUST does a standard LS estimation, but with a
* heteroscedasticity-robust covariance matrix.
*
linreg(robust) food
# constant income
@regconfidence(conf=.95)
*
linreg food
# constant income
*
* This sequence does feasible GLS. This first runs a regression of
* log(e^2) on the log of income.
*
set esq = log(%resids^2)
set z = log(income)
*
linreg esq
# constant z
*
* PRJ then computes the fitted values from the above regression, which
* are then "exp"ed to give the estimated variances. That constructed
* series is fed into LINREG with SPREAD to correct for
* heteroscedasticity.
*
prj vhat
linreg(spread=exp(vhat)) food
# constant income
Output
Linear Regression - Estimation by Least Squares
Dependent Variable FOOD
Usable Observations 40
Degrees of Freedom 38
Centered R^2 0.3850022
R-Bar^2 0.3688181
Uncentered R^2 0.9179605
Mean of Dependent Variable 283.57350000
Std Error of Dependent Variable 112.67518102
Standard Error of Estimate 89.51700453
Sum of Squared Residuals 304505.17583
Regression F(1,38) 23.7888
Significance Level of F 0.0000195
Log Likelihood -235.5088
Durbin-Watson Statistic 1.8939
Variable Coeff Std Error T-Stat Signif
************************************************************************************
1. Constant 83.416002021 43.410163135 1.92158 0.06218242
2. INCOME 10.209642968 2.093263531 4.87738 0.00001946
Linear Regression - Estimation by Weighted Least Squares
Dependent Variable FOOD
Usable Observations 40
Degrees of Freedom 38
Centered R^2 0.0289926
R-Bar^2 0.0034398
Uncentered R^2 0.9263032
Mean of Dependent Variable 64.714305114
Std Error of Dependent Variable 18.782387030
Standard Error of Estimate 18.750055676
Sum of Squared Residuals 13359.454339
Log Likelihood -230.5980
Durbin-Watson Statistic 1.8924
Variable Coeff Std Error T-Stat Signif
************************************************************************************
1. Constant 78.684080183 23.788721650 3.30762 0.00206413
2. INCOME 10.451009057 1.385891228 7.54100 0.00000000
Label Coefficient Lower Upper
Constant 78.68408 30.52633 126.84183
INCOME 10.45101 7.64542 13.25660
Linear Regression - Estimation by Least Squares
With Heteroscedasticity-Consistent (Eicker-White) Standard Errors
Dependent Variable FOOD
Usable Observations 40
Degrees of Freedom 38
Centered R^2 0.3850022
R-Bar^2 0.3688181
Uncentered R^2 0.9179605
Mean of Dependent Variable 283.57350000
Std Error of Dependent Variable 112.67518102
Standard Error of Estimate 89.51700453
Sum of Squared Residuals 304505.17583
Log Likelihood -235.5088
Durbin-Watson Statistic 1.8939
Variable Coeff Std Error T-Stat Signif
************************************************************************************
1. Constant 83.416002021 26.768349932 3.11622 0.00183187
2. INCOME 10.209642968 1.763269880 5.79018 0.00000001
Label Coefficient Lower Upper
Constant 83.41600 29.22631 137.60569
INCOME 10.20964 6.64009 13.77920
Linear Regression - Estimation by Weighted Least Squares
With Heteroscedasticity-Consistent (Eicker-White) Standard Errors
Dependent Variable FOOD
Usable Observations 40
Degrees of Freedom 38
Centered R^2 0.0289926
R-Bar^2 0.0034398
Uncentered R^2 0.9263032
Mean of Dependent Variable 64.714305114
Std Error of Dependent Variable 18.782387030
Standard Error of Estimate 18.750055676
Sum of Squared Residuals 13359.454339
Log Likelihood -230.5980
Durbin-Watson Statistic 1.8924
Variable Coeff Std Error T-Stat Signif
************************************************************************************
1. Constant 78.684080183 11.730482780 6.70766 0.00000000
2. INCOME 10.451009057 1.144327006 9.13289 0.00000000
Linear Regression - Estimation by Least Squares
Dependent Variable FOOD
Usable Observations 40
Degrees of Freedom 38
Centered R^2 0.3850022
R-Bar^2 0.3688181
Uncentered R^2 0.9179605
Mean of Dependent Variable 283.57350000
Std Error of Dependent Variable 112.67518102
Standard Error of Estimate 89.51700453
Sum of Squared Residuals 304505.17583
Regression F(1,38) 23.7888
Significance Level of F 0.0000195
Log Likelihood -235.5088
Durbin-Watson Statistic 1.8939
Variable Coeff Std Error T-Stat Signif
************************************************************************************
1. Constant 83.416002021 43.410163135 1.92158 0.06218242
2. INCOME 10.209642968 2.093263531 4.87738 0.00001946
Linear Regression - Estimation by Least Squares
Dependent Variable ESQ
Usable Observations 40
Degrees of Freedom 38
Centered R^2 0.3275973
R-Bar^2 0.3099025
Uncentered R^2 0.9551156
Mean of Dependent Variable 7.6481586460
Std Error of Dependent Variable 2.0715193896
Standard Error of Estimate 1.7208547878
Sum of Squared Residuals 112.53096563
Regression F(1,38) 18.5138
Significance Level of F 0.0001139
Log Likelihood -77.4445
Durbin-Watson Statistic 2.1756
Variable Coeff Std Error T-Stat Signif
************************************************************************************
1. Constant 0.9377959635 1.5831056238 0.59238 0.55710664
2. Z 2.3292387221 0.5413357972 4.30276 0.00011387
Linear Regression - Estimation by Weighted Least Squares
Dependent Variable FOOD
Usable Observations 40
Degrees of Freedom 38
Centered R^2 0.7285117
R-Bar^2 0.7213673
Uncentered R^2 0.9523432
Mean of Dependent Variable 6.2704901438
Std Error of Dependent Variable 2.9302264973
Standard Error of Estimate 1.5467398143
Sum of Squared Residuals 90.911354018
Log Likelihood -226.1408
Durbin-Watson Statistic 1.8902
Variable Coeff Std Error T-Stat Signif
************************************************************************************
1. Constant 76.053791769 9.713489015 7.82971 0.00000000
2. INCOME 10.633491511 0.971514247 10.94528 0.00000000
Copyright © 2025 Thomas A. Doan