RATS 10.1
RATS 10.1

HETERO.RPF is an example of estimation of linear model with heteroscedasticity. It's adapted from Hill, Griffiths, & Lim(2008).

 

It analyzes a linear regression of food expenditures on income for a cross section of 40 individuals. (Income is in units of $100, while food expenditures are in $1, so the slope coefficient would be interpreted as the percentage of income spent on food). One would expect that not only does the mean of spending go up with income, but also the variance of the residuals. That the variance also increases with income is fairly clear from looking at a scatter plot, where the LINES option on SCATTER is used to put the regression line on the x-y scatter:

 

linreg food

# constant income

scatter(style=dots,lines=%beta,vmin=0.0,$

  hlabel="x = weekly income in $100",$

  vlabel="y = weekly food expenditures in $")

# income food

 

The form of the variance is less clear. If we assume it’s directly proportional to income, we compute the weighted least squares estimator by

 

linreg(spread=income) food

# constant income

 

The next set of estimates redoes the least squares, but uses ROBUSTERRORS to correct the covariance matrix for heteroscedasticity.

 

linreg(robust) food
# constant income

 

If you know the form of heteroscedasticity, this will not be as efficient as weighted least squares—the advantage is that you do not need to know the form. It’s possible to combine both of these ideas: for instance, if you think that the variance is related to income, but are unsure of the form of that relationship, you might estimate using a “best guess” form for the SPREAD option, then include ROBUSTERRORS to correct for any remaining problems:

 

linreg(spread=income,robust) food

# constant income


To do Feasible GLS, you have to pick a functional form for the spread function. log V on log income means the variance itself will be a power of income. You start with a basic LINREG to get the residuals, then run a regression of the log squared residuals on log income:

 

linreg food

# constant income

*

set esq = log(%resids^2)

set z   = log(income)

*

linreg esq

# constant z

 

If SPREAD=INCOME used above were correct, we would expect the slope coefficient to be near 1. Instead, it's 2.33, suggesting that a better model would have the standard deviation, rather than variance, proportional to income.

 

The fitted values from the auxiliary regression can be computed with PRJ. Those are then "exp'ed" to give the estimated variance. You can do that directly on the SPREAD option, which accepts a formula.

 

prj vhat

linreg(spread=exp(vhat)) food

# constant income

 

The slope estimates actually don’t change much with the feasible GLS compared with OLS, but the standard error drops from 2.09 for least squares with White standard errors to .97 for GLS.

 

Full Program

 

open data food.dat
data(format=free,org=columns) 1 40 food income
*
linreg food
# constant income
*
* The LINES option on SCATTER allows you to add one or more y=a+bx lines
* to a scatter plot. In this case, it takes the 2-vector with the
* intercept and slope from the previous LINREG.
*
scatter(style=dots,lines=%beta,vmin=0.0,$
  hlabel="x = weekly income in $100",$
  vlabel="y = weekly food expenditures in $")
# income food
*
* LINREG with the SPREAD option does weighted least squares with a
* variance series proportional to the series given by the SPREAD option.
*
linreg(spread=income) food
# constant income
@regconfidence(conf=.95)
*
* LINREG with ROBUST does a standard LS estimation, but with a
* heteroscedasticity-robust covariance matrix.
*
linreg(robust) food
# constant income
@regconfidence(conf=.95)
*
linreg food
# constant income
*
* This sequence does feasible GLS. This first runs a regression of
* log(e^2) on the log of income.
*
set esq = log(%resids^2)
set z   = log(income)
*
linreg esq
# constant z
*
* PRJ then computes the fitted values from the above regression, which
* are then "exp"ed to give the estimated variances. That constructed
* series is fed into LINREG with SPREAD to correct for
* heteroscedasticity.
*
prj vhat
linreg(spread=exp(vhat)) food
# constant income

Output

 

Linear Regression - Estimation by Least Squares

Dependent Variable FOOD

Usable Observations                        40

Degrees of Freedom                         38

Centered R^2                        0.3850022

R-Bar^2                             0.3688181

Uncentered R^2                      0.9179605

Mean of Dependent Variable       283.57350000

Std Error of Dependent Variable  112.67518102

Standard Error of Estimate        89.51700453

Sum of Squared Residuals         304505.17583

Regression F(1,38)                    23.7888

Significance Level of F             0.0000195

Log Likelihood                      -235.5088

Durbin-Watson Statistic                1.8939

 

    Variable                        Coeff      Std Error      T-Stat      Signif

************************************************************************************

1.  Constant                     83.416002021 43.410163135      1.92158  0.06218242

2.  INCOME                       10.209642968  2.093263531      4.87738  0.00001946

 

 

Linear Regression - Estimation by Weighted Least Squares

Dependent Variable FOOD

Usable Observations                        40

Degrees of Freedom                         38

Centered R^2                        0.0289926

R-Bar^2                             0.0034398

Uncentered R^2                      0.9263032

Mean of Dependent Variable       64.714305114

Std Error of Dependent Variable  18.782387030

Standard Error of Estimate       18.750055676

Sum of Squared Residuals         13359.454339

Log Likelihood                      -230.5980

Durbin-Watson Statistic                1.8924

 

    Variable                        Coeff      Std Error      T-Stat      Signif

************************************************************************************

1.  Constant                     78.684080183 23.788721650      3.30762  0.00206413

2.  INCOME                       10.451009057  1.385891228      7.54100  0.00000000

 

 

Label    Coefficient   Lower     Upper

Constant    78.68408  30.52633 126.84183

INCOME      10.45101   7.64542  13.25660

 

 

Linear Regression - Estimation by Least Squares

With Heteroscedasticity-Consistent (Eicker-White) Standard Errors

Dependent Variable FOOD

Usable Observations                        40

Degrees of Freedom                         38

Centered R^2                        0.3850022

R-Bar^2                             0.3688181

Uncentered R^2                      0.9179605

Mean of Dependent Variable       283.57350000

Std Error of Dependent Variable  112.67518102

Standard Error of Estimate        89.51700453

Sum of Squared Residuals         304505.17583

Log Likelihood                      -235.5088

Durbin-Watson Statistic                1.8939

 

    Variable                        Coeff      Std Error      T-Stat      Signif

************************************************************************************

1.  Constant                     83.416002021 26.768349932      3.11622  0.00183187

2.  INCOME                       10.209642968  1.763269880      5.79018  0.00000001

 

 

Label    Coefficient   Lower     Upper

Constant    83.41600  29.22631 137.60569

INCOME      10.20964   6.64009  13.77920

 

 

Linear Regression - Estimation by Weighted Least Squares

With Heteroscedasticity-Consistent (Eicker-White) Standard Errors

Dependent Variable FOOD

Usable Observations                        40

Degrees of Freedom                         38

Centered R^2                        0.0289926

R-Bar^2                             0.0034398

Uncentered R^2                      0.9263032

Mean of Dependent Variable       64.714305114

Std Error of Dependent Variable  18.782387030

Standard Error of Estimate       18.750055676

Sum of Squared Residuals         13359.454339

Log Likelihood                      -230.5980

Durbin-Watson Statistic                1.8924

 

    Variable                        Coeff      Std Error      T-Stat      Signif

************************************************************************************

1.  Constant                     78.684080183 11.730482780      6.70766  0.00000000

2.  INCOME                       10.451009057  1.144327006      9.13289  0.00000000

 

 

Linear Regression - Estimation by Least Squares

Dependent Variable FOOD

Usable Observations                        40

Degrees of Freedom                         38

Centered R^2                        0.3850022

R-Bar^2                             0.3688181

Uncentered R^2                      0.9179605

Mean of Dependent Variable       283.57350000

Std Error of Dependent Variable  112.67518102

Standard Error of Estimate        89.51700453

Sum of Squared Residuals         304505.17583

Regression F(1,38)                    23.7888

Significance Level of F             0.0000195

Log Likelihood                      -235.5088

Durbin-Watson Statistic                1.8939

 

    Variable                        Coeff      Std Error      T-Stat      Signif

************************************************************************************

1.  Constant                     83.416002021 43.410163135      1.92158  0.06218242

2.  INCOME                       10.209642968  2.093263531      4.87738  0.00001946

 

Linear Regression - Estimation by Least Squares

Dependent Variable ESQ

Usable Observations                        40

Degrees of Freedom                         38

Centered R^2                        0.3275973

R-Bar^2                             0.3099025

Uncentered R^2                      0.9551156

Mean of Dependent Variable       7.6481586460

Std Error of Dependent Variable  2.0715193896

Standard Error of Estimate       1.7208547878

Sum of Squared Residuals         112.53096563

Regression F(1,38)                    18.5138

Significance Level of F             0.0001139

Log Likelihood                       -77.4445

Durbin-Watson Statistic                2.1756

 

    Variable                        Coeff      Std Error      T-Stat      Signif

************************************************************************************

1.  Constant                     0.9377959635 1.5831056238      0.59238  0.55710664

2.  Z                            2.3292387221 0.5413357972      4.30276  0.00011387

 

 

Linear Regression - Estimation by Weighted Least Squares

Dependent Variable FOOD

Usable Observations                        40

Degrees of Freedom                         38

Centered R^2                        0.7285117

R-Bar^2                             0.7213673

Uncentered R^2                      0.9523432

Mean of Dependent Variable       6.2704901438

Std Error of Dependent Variable  2.9302264973

Standard Error of Estimate       1.5467398143

Sum of Squared Residuals         90.911354018

Log Likelihood                      -226.1408

Durbin-Watson Statistic                1.8902

 

    Variable                        Coeff      Std Error      T-Stat      Signif

************************************************************************************

1.  Constant                     76.053791769  9.713489015      7.82971  0.00000000

2.  INCOME                       10.633491511  0.971514247     10.94528  0.00000000


 

Graph


 


Copyright © 2025 Thomas A. Doan