|
Examples / HETEROTEST.RPF |
HETEROTEST.RPF is an example of tests for heteroscedasticity. It's adapted from Wooldridge(2009), Example 8.4 from pages 273-274.
The model is a hedonic price index for homes using lot size, square footage and number of bedrooms as the explanatory variables. Because this is done (for illustration) as a linear model, there's a good chance that we would see heteroscedasticity, as the variance in actual dollars for larger, more expensive homes is likely to be higher than for less expensive homes.
Because we estimate the base regression many times, we define an EQUATION so we can just use the EQUATION option on all the LINREG's.
equation linff price
# constant lotsize sqrft bdrms
The first test is the Breusch-Pagan test, which can be done by regressing the squared residuals on the regressors. This will have three degrees of freedom for the three (non-constant) explanatory variables. This does both the F and LM forms:
linreg(equation=linff)
set usq = %resids^2
linreg usq
# constant lotsize sqrft bdrms
exclude(title="Breusch-Pagan Test for Heteroscedasticity")
# lotsize sqrft bdrms
cdf(title="Breusch-Pagan Test, LM Form") chisqr %trsquared 3
The Breusch-Pagan LM test can also be done using the procedure @RegWhiteTest with the option TYPE=BP. You need to do this right after the regression that you want to test. So you can replace the last block of code with simply:
linreg(equation=linff)
@RegWhiteTest(type=bp)
The results would have us reject homoscedasticity rather strongly.
The Harvey test does a log-log regression for the squared residuals on (in this case) lot size. The LM version of this is:
linreg(equation=linff)
set logusq = log(%resids^2)
set llotsize = log(lotsize)
linreg logusq
# constant llotsize
cdf(title="Harvey Test") chisqr %trsquared 1
The results here would seem to show that the lot size isn't very good at explaining the variance.
The next block of code does the White test manually, by creating the squares and cross products of the regressors and doing a Breusch-Pagan test using those and the original regressors as the potential explanatory variables for the variance. We don't recommend this—use the @RegWhiteTest procedure instead.
linreg(equation=linff)
set usq = %resids^2
set lotsq = lotsize^2
set sqrftsq = sqrft^2
set bdrmssq = bdrms^2
set lotxsqrft = lotsize*sqrft
set lotxbdrms = lotsize*bdrms
set sqftxbdrms = sqrft*bdrms
*
linreg usq
# constant lotsize sqrft bdrms $
lotsq sqrftsq bdrmssq lotxsqrft lotxbdrms sqftxbdrms
cdf(title="White Heteroscedasticity Test") chisqr $
%trsquared %nobs-%ndf-1
This is the equivalent of that last code block:
linreg(equation=linff)
@RegWhiteTest
This gives an even stronger rejection than the original Breusch-Pagan test.
Finally, this does a Goldfeld-Quandt test (on lot size). ORDER with the RANKS option keeps the LOTSIZE series intact, but creates a series named LOTRANKS which has the ranks (from 1 to 88) of the corresponding LOTSIZE values. This compares the variance for the first 36 observations with the last 36, leaving out the middle 16 to try to improve the power. As with the Harvey test, the results indicate that lot size is not a significant factor in explaining the variance.
order(ranks=lotranks) lotsize
linreg(equation=linff,smpl=lotranks<=36)
compute rss1=%rss,ndf1=%ndf
linreg(equation=linff,smpl=lotranks>=53)
compute rss2=%rss,ndf2=%ndf
cdf(title="Goldfeld-Quandt Test") ftest $
(rss2/ndf2)/(rss1/ndf1) ndf2 ndf1
Full Program
open data hprice1.raw
data(format=free,org=columns) 1 88 price assess bdrms lotsize sqrft $
colonial lprice lassess llotsize lsqrft
*
* Because we keep re-estimating this (possibly over different samples),
* we define an EQUATION.
*
equation linff price
# constant lotsize sqrft bdrms
*
linreg(equation=linff)
*
set usq = %resids^2
linreg usq
# constant lotsize sqrft bdrms
exclude(title="Breusch-Pagan Test for Heteroscedasticity")
# lotsize sqrft bdrms
cdf(title="Breusch-Pagan Test, LM Form") chisqr %trsquared 3
*
* The LM test can also be done using the procedure @RegWhiteTest with
* the option type=bp. You need to do this right after the regression
* that you want to test.
*
linreg(equation=linff)
@RegWhiteTest(type=bp)
*
* Harvey test
*
linreg(equation=linff)
set logusq = log(%resids^2)
set llotsize = log(lotsize)
linreg logusq
# constant llotsize
cdf(title="Harvey Test") chisqr %trsquared 1
*
* White's test (the hard way, not recommended)
*
linreg(equation=linff)
set usq = %resids^2
set lotsq = lotsize^2
set sqrftsq = sqrft^2
set bdrmssq = bdrms^2
set lotxsqrft = lotsize*sqrft
set lotxbdrms = lotsize*bdrms
set sqftxbdrms = sqrft*bdrms
*
linreg usq
# constant lotsize sqrft bdrms $
lotsq sqrftsq bdrmssq lotxsqrft lotxbdrms sqftxbdrms
cdf(title="White Heteroscedasticity Test") chisqr $
%trsquared %nobs-%ndf-1
*
* Using @RegWhiteTest (recommended)
*
linreg(equation=linff)
@RegWhiteTest
*
* Goldfeld-Quandt test (on lot size)
*
order(ranks=lotranks) lotsize
linreg(equation=linff,smpl=lotranks<=36)
compute rss1=%rss,ndf1=%ndf
linreg(equation=linff,smpl=lotranks>=53)
compute rss2=%rss,ndf2=%ndf
cdf(title="Goldfeld-Quandt Test") ftest $
(rss2/ndf2)/(rss1/ndf1) ndf2 ndf1
Output
Linear Regression - Estimation by Least Squares
Dependent Variable PRICE
Usable Observations 88
Degrees of Freedom 84
Centered R^2 0.6723622
R-Bar^2 0.6606609
Uncentered R^2 0.9646239
Mean of Dependent Variable 293.54603409
Std Error of Dependent Variable 102.71344517
Standard Error of Estimate 59.83347988
Sum of Squared Residuals 300723.80646
Regression F(3,84) 57.4602
Significance Level of F 0.0000000
Log Likelihood -482.8775
Durbin-Watson Statistic 2.1098
Variable Coeff Std Error T-Stat Signif
************************************************************************************
1. Constant -21.77030860 29.47504196 -0.73860 0.46220778
2. LOTSIZE 0.00206771 0.00064213 3.22010 0.00182293
3. SQRFT 0.12277819 0.01323741 9.27509 0.00000000
4. BDRMS 13.85252186 9.01014545 1.53744 0.12794506
Linear Regression - Estimation by Least Squares
Dependent Variable USQ
Usable Observations 88
Degrees of Freedom 84
Centered R^2 0.1601407
R-Bar^2 0.1301458
Uncentered R^2 0.3197842
Mean of Dependent Variable 3417.3159824
Std Error of Dependent Variable 7094.3837812
Standard Error of Estimate 6616.6462785
Sum of Squared Residuals 3677520669.9
Regression F(3,84) 5.3389
Significance Level of F 0.0020477
Log Likelihood -896.9860
Durbin-Watson Statistic 2.3511
Variable Coeff Std Error T-Stat Signif
************************************************************************************
1. Constant -5522.794789 3259.478257 -1.69438 0.09389782
2. LOTSIZE 0.201521 0.071009 2.83796 0.00569096
3. SQRFT 1.691037 1.463850 1.15520 0.25128478
4. BDRMS 1041.760223 996.381047 1.04554 0.29877103
Breusch-Pagan Test for Heteroscedasticity
Null Hypothesis : The Following Coefficients Are Zero
LOTSIZE
SQRFT
BDRMS
F(3,84)= 5.33892 with Significance Level 0.00204774
Breusch-Pagan Test, LM Form
Chi-Squared(3)= 14.092386 with Significance Level 0.00278206
Linear Regression - Estimation by Least Squares
Dependent Variable PRICE
Usable Observations 88
Degrees of Freedom 84
Centered R^2 0.6723622
R-Bar^2 0.6606609
Uncentered R^2 0.9646239
Mean of Dependent Variable 293.54603409
Std Error of Dependent Variable 102.71344517
Standard Error of Estimate 59.83347988
Sum of Squared Residuals 300723.80646
Regression F(3,84) 57.4602
Significance Level of F 0.0000000
Log Likelihood -482.8775
Durbin-Watson Statistic 2.1098
Variable Coeff Std Error T-Stat Signif
************************************************************************************
1. Constant -21.77030860 29.47504196 -0.73860 0.46220778
2. LOTSIZE 0.00206771 0.00064213 3.22010 0.00182293
3. SQRFT 0.12277819 0.01323741 9.27509 0.00000000
4. BDRMS 13.85252186 9.01014545 1.53744 0.12794506
Breusch-Pagan Heteroscedasticity Test
Chi-Squared(3)= 14.092386 with Significance Level 0.00278206
Linear Regression - Estimation by Least Squares
Dependent Variable PRICE
Usable Observations 88
Degrees of Freedom 84
Centered R^2 0.6723622
R-Bar^2 0.6606609
Uncentered R^2 0.9646239
Mean of Dependent Variable 293.54603409
Std Error of Dependent Variable 102.71344517
Standard Error of Estimate 59.83347988
Sum of Squared Residuals 300723.80646
Regression F(3,84) 57.4602
Significance Level of F 0.0000000
Log Likelihood -482.8775
Durbin-Watson Statistic 2.1098
Variable Coeff Std Error T-Stat Signif
************************************************************************************
1. Constant -21.77030860 29.47504196 -0.73860 0.46220778
2. LOTSIZE 0.00206771 0.00064213 3.22010 0.00182293
3. SQRFT 0.12277819 0.01323741 9.27509 0.00000000
4. BDRMS 13.85252186 9.01014545 1.53744 0.12794506
Linear Regression - Estimation by Least Squares
Dependent Variable LOGUSQ
Usable Observations 88
Degrees of Freedom 86
Centered R^2 0.0215285
R-Bar^2 0.0101509
Uncentered R^2 0.8979108
Mean of Dependent Variable 6.6147892576
Std Error of Dependent Variable 2.2706012440
Standard Error of Estimate 2.2590475263
Sum of Squared Residuals 438.88343242
Regression F(1,86) 1.8922
Significance Level of F 0.1725280
Log Likelihood -195.5701
Durbin-Watson Statistic 2.4417
Variable Coeff Std Error T-Stat Signif
************************************************************************************
1. Constant 1.1617367352 3.9715291766 0.29252 0.77059652
2. LLOTSIZE 0.6123513270 0.4451628292 1.37557 0.17252802
Harvey Test
Chi-Squared(1)= 1.894506 with Significance Level 0.16869458
Linear Regression - Estimation by Least Squares
Dependent Variable PRICE
Usable Observations 88
Degrees of Freedom 84
Centered R^2 0.6723622
R-Bar^2 0.6606609
Uncentered R^2 0.9646239
Mean of Dependent Variable 293.54603409
Std Error of Dependent Variable 102.71344517
Standard Error of Estimate 59.83347988
Sum of Squared Residuals 300723.80646
Regression F(3,84) 57.4602
Significance Level of F 0.0000000
Log Likelihood -482.8775
Durbin-Watson Statistic 2.1098
Variable Coeff Std Error T-Stat Signif
************************************************************************************
1. Constant -21.77030860 29.47504196 -0.73860 0.46220778
2. LOTSIZE 0.00206771 0.00064213 3.22010 0.00182293
3. SQRFT 0.12277819 0.01323741 9.27509 0.00000000
4. BDRMS 13.85252186 9.01014545 1.53744 0.12794506
Linear Regression - Estimation by Least Squares
Dependent Variable USQ
Usable Observations 88
Degrees of Freedom 78
Centered R^2 0.3833143
R-Bar^2 0.3121582
Uncentered R^2 0.5005361
Mean of Dependent Variable 3417.3159824
Std Error of Dependent Variable 7094.3837812
Standard Error of Estimate 5883.8141474
Sum of Squared Residuals 2700302975.9
Regression F(9,78) 5.3870
Significance Level of F 0.0000101
Log Likelihood -883.3955
Durbin-Watson Statistic 2.0527
Variable Coeff Std Error T-Stat Signif
************************************************************************************
1. Constant 15626.243719 11369.411858 1.37441 0.17325028
2. LOTSIZE -1.859507 0.637097 -2.91872 0.00459119
3. SQRFT -2.673918 8.662183 -0.30869 0.75838135
4. BDRMS -1982.841114 5438.482750 -0.36459 0.71640071
5. LOTSQ -0.000000 0.000005 -0.10750 0.91467008
6. SQRFTSQ 0.000352 0.001840 0.19148 0.84864369
7. BDRMSSQ 289.754063 758.830273 0.38184 0.70361609
8. LOTXSQRFT 0.000457 0.000277 1.64967 0.10303162
9. LOTXBDRMS 0.314647 0.252094 1.24813 0.21571537
10. SQFTXBDRMS -1.020860 1.667154 -0.61234 0.54209582
White Heteroscedasticity Test
Chi-Squared(9)= 33.731657 with Significance Level 0.00009953
Linear Regression - Estimation by Least Squares
Dependent Variable PRICE
Usable Observations 88
Degrees of Freedom 84
Centered R^2 0.6723622
R-Bar^2 0.6606609
Uncentered R^2 0.9646239
Mean of Dependent Variable 293.54603409
Std Error of Dependent Variable 102.71344517
Standard Error of Estimate 59.83347988
Sum of Squared Residuals 300723.80646
Regression F(3,84) 57.4602
Significance Level of F 0.0000000
Log Likelihood -482.8775
Durbin-Watson Statistic 2.1098
Variable Coeff Std Error T-Stat Signif
************************************************************************************
1. Constant -21.77030860 29.47504196 -0.73860 0.46220778
2. LOTSIZE 0.00206771 0.00064213 3.22010 0.00182293
3. SQRFT 0.12277819 0.01323741 9.27509 0.00000000
4. BDRMS 13.85252186 9.01014545 1.53744 0.12794506
White Heteroscedasticity Test
Chi-Squared(9)= 33.731657 with Significance Level 0.00009953
Linear Regression - Estimation by Least Squares
Dependent Variable PRICE
Usable Observations 36
Degrees of Freedom 32
Skipped/Missing (from 88) 52
Centered R^2 0.2049898
R-Bar^2 0.1304576
Uncentered R^2 0.9602660
Mean of Dependent Variable 248.09791667
Std Error of Dependent Variable 57.71234839
Standard Error of Estimate 53.81633555
Sum of Squared Residuals 92678.335090
Regression F(3,32) 2.7504
Significance Level of F 0.0588086
Log Likelihood -192.4425
Durbin-Watson Statistic 2.0486
Variable Coeff Std Error T-Stat Signif
************************************************************************************
1. Constant 44.496857844 87.408803284 0.50907 0.61419837
2. LOTSIZE 0.011696813 0.008643992 1.35317 0.18548346
3. SQRFT 0.077404477 0.028108424 2.75378 0.00963109
4. BDRMS 1.532607749 13.921042823 0.11009 0.91302327
Linear Regression - Estimation by Least Squares
Dependent Variable PRICE
Usable Observations 36
Degrees of Freedom 32
Skipped/Missing (from 88) 52
Centered R^2 0.7377785
R-Bar^2 0.7131952
Uncentered R^2 0.9717906
Mean of Dependent Variable 353.75069444
Std Error of Dependent Variable 124.56383912
Standard Error of Estimate 66.70911791
Sum of Squared Residuals 142403.40518
Regression F(3,32) 30.0114
Significance Level of F 0.0000000
Log Likelihood -200.1740
Durbin-Watson Statistic 2.3022
Variable Coeff Std Error T-Stat Signif
************************************************************************************
1. Constant -31.62793264 44.92983273 -0.70394 0.48656105
2. LOTSIZE 0.00144117 0.00078054 1.84639 0.07410319
3. SQRFT 0.12342990 0.02128261 5.79956 0.00000194
4. BDRMS 21.89272051 15.14338108 1.44570 0.15798597
Goldfeld-Quandt Test
F(32,32)= 1.53653 with Significance Level 0.11489700
Copyright © 2025 Thomas A. Doan