HETERO.RPF

HETERO.RPF is an example of estimation of linear model with heteroscedasticity. It's adapted from Hill, Griffiths, & Lim(2008).

It analyzes a linear regression of food expenditures on income for a cross section of 40 individuals. (Income is in units of $100, while food expenditures are in $1, so the slope coefficient would be interpreted as the percentage of income spent on food). One would expect that not only does the mean of spending go up with income, but also the variance of the residuals. That the variance also increases with income is fairly clear from looking at a scatter plot, where the LINES option on SCATTER is used to put the regression line on the x-y scatter:

linreg food

# constant income

scatter(style=dots,lines=%beta,vmin=0.0,$

hlabel="x = weekly income in $100",$

vlabel="y = weekly food expenditures in $")

# income food

The form of the variance is less clear. If we assume it’s directly proportional to income, we compute the weighted least squares estimator by

linreg(spread=income) food

# constant income

The next set of estimates redoes the least squares, but uses ROBUSTERRORS to correct the covariance matrix for heteroscedasticity.

linreg(robust) food
# constant income

If you know the form of heteroscedasticity, this will not be as efficient as weighted least squares—the advantage is that you do not need to know the form. It’s possible to combine both of these ideas: for instance, if you think that the variance is related to income, but are unsure of the form of that relationship, you might estimate using a “best guess” form for the SPREAD option, then include ROBUSTERRORS to correct for any remaining problems:

linreg(spread=income,robust) food

# constant income

To do Feasible GLS, you have to pick a functional form for the spread function. log V on log income means the variance itself will be a power of income. You start with a basic LINREG to get the residuals, then run a regression of the log squared residuals on log income:

linreg food

# constant income

set esq = log(%resids^2)

set z = log(income)

linreg esq

# constant z

If SPREAD=INCOME used above were correct, we would expect the slope coefficient to be near 1. Instead, it's 2.33, suggesting that a better model would have the standard deviation, rather than variance, proportional to income.

The fitted values from the auxiliary regression can be computed with PRJ. Those are then "exp'ed" to give the estimated variance. You can do that directly on the SPREAD option, which accepts a formula.

prj vhat

linreg(spread=exp(vhat)) food

# constant income

The slope estimates actually don’t change much with the feasible GLS compared with OLS, but the standard error drops from 2.09 for least squares with White standard errors to .97 for GLS.

Full Program

open data food.dat
data(format=free,org=columns) 1 40 food income
*
linreg food
# constant income
*
* The LINES option on SCATTER allows you to add one or more y=a+bx lines
* to a scatter plot. In this case, it takes the 2-vector with the
* intercept and slope from the previous LINREG.
*
scatter(style=dots,lines=%beta,vmin=0.0,$
hlabel="x = weekly income in $100",$
vlabel="y = weekly food expenditures in $")
# income food
*
* LINREG with the SPREAD option does weighted least squares with a
* variance series proportional to the series given by the SPREAD option.
*
linreg(spread=income) food
# constant income
@regconfidence(conf=.95)
*
* LINREG with ROBUST does a standard LS estimation, but with a
* heteroscedasticity-robust covariance matrix.
*
linreg(robust) food
# constant income
@regconfidence(conf=.95)
*
linreg food
# constant income
*
* This sequence does feasible GLS. This first runs a regression of
* log(e^2) on the log of income.
*
set esq = log(%resids^2)
set z = log(income)
*
linreg esq
# constant z
*
* PRJ then computes the fitted values from the above regression, which
* are then "exp"ed to give the estimated variances. That constructed
* series is fed into LINREG with SPREAD to correct for
* heteroscedasticity.
*
prj vhat
linreg(spread=exp(vhat)) food
# constant income

Output

Linear Regression - Estimation by Least Squares

Dependent Variable FOOD

Usable Observations 40

Degrees of Freedom 38

Centered R^2 0.3850022

R-Bar^2 0.3688181

Uncentered R^2 0.9179605

Mean of Dependent Variable 283.57350000

Std Error of Dependent Variable 112.67518102

Standard Error of Estimate 89.51700453

Sum of Squared Residuals 304505.17583

Regression F(1,38) 23.7888

Significance Level of F 0.0000195

Log Likelihood -235.5088

Durbin-Watson Statistic 1.8939

Variable Coeff Std Error T-Stat Signif

************************************************************************************

1. Constant 83.416002021 43.410163135 1.92158 0.06218242

2. INCOME 10.209642968 2.093263531 4.87738 0.00001946

Linear Regression - Estimation by Weighted Least Squares

Dependent Variable FOOD

Usable Observations 40

Degrees of Freedom 38

Centered R^2 0.0289926

R-Bar^2 0.0034398

Uncentered R^2 0.9263032

Mean of Dependent Variable 64.714305114

Std Error of Dependent Variable 18.782387030

Standard Error of Estimate 18.750055676

Sum of Squared Residuals 13359.454339

Log Likelihood -230.5980

Durbin-Watson Statistic 1.8924

Variable Coeff Std Error T-Stat Signif

************************************************************************************

1. Constant 78.684080183 23.788721650 3.30762 0.00206413

2. INCOME 10.451009057 1.385891228 7.54100 0.00000000

Label Coefficient Lower Upper

Constant 78.68408 30.52633 126.84183

INCOME 10.45101 7.64542 13.25660

Linear Regression - Estimation by Least Squares

With Heteroscedasticity-Consistent (Eicker-White) Standard Errors

Dependent Variable FOOD

Usable Observations 40

Degrees of Freedom 38

Centered R^2 0.3850022

R-Bar^2 0.3688181

Uncentered R^2 0.9179605

Mean of Dependent Variable 283.57350000

Std Error of Dependent Variable 112.67518102

Standard Error of Estimate 89.51700453

Sum of Squared Residuals 304505.17583

Log Likelihood -235.5088

Durbin-Watson Statistic 1.8939

Variable Coeff Std Error T-Stat Signif

************************************************************************************

1. Constant 83.416002021 26.768349932 3.11622 0.00183187

2. INCOME 10.209642968 1.763269880 5.79018 0.00000001

Label Coefficient Lower Upper

Constant 83.41600 29.22631 137.60569

INCOME 10.20964 6.64009 13.77920

Linear Regression - Estimation by Weighted Least Squares

With Heteroscedasticity-Consistent (Eicker-White) Standard Errors

Dependent Variable FOOD

Usable Observations 40

Degrees of Freedom 38

Centered R^2 0.0289926

R-Bar^2 0.0034398

Uncentered R^2 0.9263032

Mean of Dependent Variable 64.714305114

Std Error of Dependent Variable 18.782387030

Standard Error of Estimate 18.750055676

Sum of Squared Residuals 13359.454339

Log Likelihood -230.5980

Durbin-Watson Statistic 1.8924

Variable Coeff Std Error T-Stat Signif

************************************************************************************

1. Constant 78.684080183 11.730482780 6.70766 0.00000000

2. INCOME 10.451009057 1.144327006 9.13289 0.00000000

Linear Regression - Estimation by Least Squares

Dependent Variable FOOD

Usable Observations 40

Degrees of Freedom 38

Centered R^2 0.3850022

R-Bar^2 0.3688181

Uncentered R^2 0.9179605

Mean of Dependent Variable 283.57350000

Std Error of Dependent Variable 112.67518102

Standard Error of Estimate 89.51700453

Sum of Squared Residuals 304505.17583

Regression F(1,38) 23.7888

Significance Level of F 0.0000195

Log Likelihood -235.5088

Durbin-Watson Statistic 1.8939

Variable Coeff Std Error T-Stat Signif

************************************************************************************

1. Constant 83.416002021 43.410163135 1.92158 0.06218242

2. INCOME 10.209642968 2.093263531 4.87738 0.00001946

Linear Regression - Estimation by Least Squares

Dependent Variable ESQ

Usable Observations 40

Degrees of Freedom 38

Centered R^2 0.3275973

R-Bar^2 0.3099025

Uncentered R^2 0.9551156

Mean of Dependent Variable 7.6481586460

Std Error of Dependent Variable 2.0715193896

Standard Error of Estimate 1.7208547878

Sum of Squared Residuals 112.53096563

Regression F(1,38) 18.5138

Significance Level of F 0.0001139

Log Likelihood -77.4445

Durbin-Watson Statistic 2.1756

Variable Coeff Std Error T-Stat Signif

************************************************************************************

1. Constant 0.9377959635 1.5831056238 0.59238 0.55710664

2. Z 2.3292387221 0.5413357972 4.30276 0.00011387

Linear Regression - Estimation by Weighted Least Squares

Dependent Variable FOOD

Usable Observations 40

Degrees of Freedom 38

Centered R^2 0.7285117

R-Bar^2 0.7213673

Uncentered R^2 0.9523432

Mean of Dependent Variable 6.2704901438

Std Error of Dependent Variable 2.9302264973

Standard Error of Estimate 1.5467398143

Sum of Squared Residuals 90.911354018

Log Likelihood -226.1408

Durbin-Watson Statistic 1.8902

Variable Coeff Std Error T-Stat Signif

************************************************************************************

1. Constant 76.053791769 9.713489015 7.82971 0.00000000

2. INCOME 10.633491511 0.971514247 10.94528 0.00000000

Graph