Examples / RANDOMIZE.RPF |
RANDOMIZE.RPF is an example of approximate randomization. This looks at the heteroscedasticity tests done in the HETEROTEST.RPF example. The null hypothesis is that the variance is unrelated to lot size. This looks at two different choices for the test statistic:
1.The ratio of variances between the two subsamples (which would be testing against the alternative that the variance for small lots is smaller than those for large ones)
2.The rank correlation between lot size and the squared residuals (which tests more generally that the variance increases with lot size).
This generates the ranks (smallest to largest) for the LOTSIZE series.
order(ranks=lotranks) lotsize
This does the regression of interest.
linreg price
# constant lotsize sqrft bdrms
The first test statistic (REFER1) is the ratio between the sum of the residuals squared over the final 36 observations (large lot size) to that over the first 36 (small lots), skipping the middle 16. The second (REFER2) is the correlation between the ranks of the lot size and the squared residual.
set ressqr = %resids^2
sstats / ressqr*(lotranks<=36)>>sum1 ressqr*(lotranks>=53)>>sum2
compute refer1=sum2/sum1
order(rank=vrank) ressqr
compute refer2=%corr(lotranks,vrank)
COUNT1 and COUNT2 will be the counts of the number of times we get a more extreme value after reshuffling. We do 999 shuffles.
compute count1=count2=0.0
compute ns=999
Inside the loop, we use BOOT with NOREPLACE to come up with a new permutation of the RESSQR variable. Recompute the test statistics for these. (Note that the permuted residuals will no longer have any relation to the lot size, which is the whole point to the randomization exercise). Increase the count variables when the permuted data produces a more extreme test statistic that we got with the actual data.
boot(noreplace) entries 1 88
set shuffle = ressqr(entries(t))
sstats / shuffle*(lotranks<=36)>>sum1 shuffle*(lotranks>=53)>>sum2
compute teststat=sum2/sum1
compute count1=count1+(teststat>refer1)
order(rank=vrank) shuffle
compute teststat=%corr(lotranks,vrank)
compute count2=count2+(teststat>refer2)
The p-values for the tests are computed by taking (count+1)/(draws+1). The +1’s are needed because we are, in effect, adding the actual sample in with the draws and seeing where it ends up.
?"Test 1, p-value" (count1+1)/(ns+1)
?"Test 2, p-value" (count2+1)/(ns+1)
Full Program
open data hprice1.raw
data(format=free,org=columns) 1 88 price assess bdrms lotsize sqrft $
colonial lprice lassess llotsize lsqrft
*
* This examines heteroscedasticity in a hedonic price regression based
* upon lot size.
*
order(ranks=lotranks) lotsize
linreg price
# constant lotsize sqrft bdrms
*
* The first test statistic is the ratio between the sum of the residuals
* squared over the final 36 observations (large lot size) to that over
* the first 36 (small lots), skipping the middle 16. The second is the
* correlation between the ranks of the lot size and the squared residual.
*
set ressqr = %resids^2
sstats / ressqr*(lotranks<=36)>>sum1 ressqr*(lotranks>=53)>>sum2
compute refer1=sum2/sum1
order(rank=vrank) ressqr
compute refer2=%corr(lotranks,vrank)
*
* count1 and count2 are the number of times we get a more extreme value
* after reshuffling. We do 999 shuffles.
*
compute count1=count2=0.0
compute ns=999
do draw=1,ns
*
* Use BOOT with NOREPLACE to come up with a new permutation of the
* ressqr variable. Recompute the test statistic for these.
*
boot(noreplace) entries 1 88
set shuffle = ressqr(entries(t))
*
sstats / shuffle*(lotranks<=36)>>sum1 $
shuffle*(lotranks>=53)>>sum2
compute teststat=sum2/sum1
compute count1=count1+(teststat>refer1)
*
order(rank=vrank) shuffle
compute teststat=%corr(lotranks,vrank)
compute count2=count2+(teststat>refer2)
end do draw
*
* The p-values for the tests are computed by taking (count+1)/(draws+1).
* The +1's are needed because we are, in effect, adding the actual
* sample in with the draws and seeing where it ends up.
*
?"Test 1, p-value" (count1+1)/(ns+1)
?"Test 2, p-value" (count2+1)/(ns+1)
Output
The p-values at the end depend upon random numbers, and so will not be exactly reproducible.
Linear Regression - Estimation by Least Squares
Dependent Variable PRICE
Usable Observations 88
Degrees of Freedom 84
Centered R^2 0.6723622
R-Bar^2 0.6606609
Uncentered R^2 0.9646239
Mean of Dependent Variable 293.54603409
Std Error of Dependent Variable 102.71344517
Standard Error of Estimate 59.83347988
Sum of Squared Residuals 300723.80646
Regression F(3,84) 57.4602
Significance Level of F 0.0000000
Log Likelihood -482.8775
Durbin-Watson Statistic 2.1098
Variable Coeff Std Error T-Stat Signif
************************************************************************************
1. Constant -21.77030860 29.47504196 -0.73860 0.46220778
2. LOTSIZE 0.00206771 0.00064213 3.22010 0.00182293
3. SQRFT 0.12277819 0.01323741 9.27509 0.00000000
4. BDRMS 13.85252186 9.01014545 1.53744 0.12794506
Test 1, p-value 0.28700
Test 2, p-value 0.04500
Copyright © 2025 Thomas A. Doan