RANDOMIZE.RPF

RANDOMIZE.RPF is an example of approximate randomization. This looks at the heteroscedasticity tests done in the HETEROTEST.RPF example. The null hypothesis is that the variance is unrelated to lot size. This looks at two different choices for the test statistic:

1.The ratio of variances between the two subsamples (which would be testing against the alternative that the variance for small lots is smaller than those for large ones)

2.The rank correlation between lot size and the squared residuals (which tests more generally that the variance increases with lot size).

This generates the ranks (smallest to largest) for the LOTSIZE series.

order(ranks=lotranks) lotsize

This does the regression of interest.

linreg price

# constant lotsize sqrft bdrms

The first test statistic (REFER1) is the ratio between the sum of the residuals squared over the final 36 observations (large lot size) to that over the first 36 (small lots), skipping the middle 16. The second (REFER2) is the correlation between the ranks of the lot size and the squared residual.

set ressqr = %resids^2

sstats / ressqr*(lotranks<=36)>>sum1 ressqr*(lotranks>=53)>>sum2

compute refer1=sum2/sum1

order(rank=vrank) ressqr

compute refer2=%corr(lotranks,vrank)

COUNT1 and COUNT2 will be the counts of the number of times we get a more extreme value after reshuffling. We do 999 shuffles.

compute count1=count2=0.0

compute ns=999

Inside the loop, we use BOOT with NOREPLACE to come up with a new permutation of the RESSQR variable. Recompute the test statistics for these. (Note that the permuted residuals will no longer have any relation to the lot size, which is the whole point to the randomization exercise). Increase the count variables when the permuted data produces a more extreme test statistic that we got with the actual data.

boot(noreplace) entries 1 88

set shuffle = ressqr(entries(t))

sstats / shuffle*(lotranks<=36)>>sum1 shuffle*(lotranks>=53)>>sum2

compute teststat=sum2/sum1

compute count1=count1+(teststat>refer1)

order(rank=vrank) shuffle

compute teststat=%corr(lotranks,vrank)

compute count2=count2+(teststat>refer2)

The p-values for the tests are computed by taking (count+1)/(draws+1). The +1’s are needed because we are, in effect, adding the actual sample in with the draws and seeing where it ends up.

?"Test 1, p-value" (count1+1)/(ns+1)

?"Test 2, p-value" (count2+1)/(ns+1)

Full Program

open data hprice1.raw

data(format=free,org=columns) 1 88 price assess bdrms lotsize sqrft $

colonial lprice lassess llotsize lsqrft

* This examines heteroscedasticity in a hedonic price regression based

* upon lot size.

order(ranks=lotranks) lotsize

linreg price

# constant lotsize sqrft bdrms

* The first test statistic is the ratio between the sum of the residuals

* squared over the final 36 observations (large lot size) to that over

* the first 36 (small lots), skipping the middle 16. The second is the

* correlation between the ranks of the lot size and the squared residual.

set ressqr = %resids^2

sstats / ressqr*(lotranks<=36)>>sum1 ressqr*(lotranks>=53)>>sum2

compute refer1=sum2/sum1

order(rank=vrank) ressqr

compute refer2=%corr(lotranks,vrank)

* count1 and count2 are the number of times we get a more extreme value

* after reshuffling. We do 999 shuffles.

compute count1=count2=0.0

compute ns=999

do draw=1,ns

* Use BOOT with NOREPLACE to come up with a new permutation of the

* ressqr variable. Recompute the test statistic for these.

boot(noreplace) entries 1 88

set shuffle = ressqr(entries(t))

sstats / shuffle*(lotranks<=36)>>sum1 $

shuffle*(lotranks>=53)>>sum2

compute teststat=sum2/sum1

compute count1=count1+(teststat>refer1)

order(rank=vrank) shuffle

compute teststat=%corr(lotranks,vrank)

compute count2=count2+(teststat>refer2)

end do draw

* The p-values for the tests are computed by taking (count+1)/(draws+1).

* The +1's are needed because we are, in effect, adding the actual

* sample in with the draws and seeing where it ends up.

?"Test 1, p-value" (count1+1)/(ns+1)

?"Test 2, p-value" (count2+1)/(ns+1)

Output

The p-values at the end depend upon random numbers, and so will not be exactly reproducible.

Linear Regression - Estimation by Least Squares

Dependent Variable PRICE

Usable Observations 88

Degrees of Freedom 84

Centered R^2 0.6723622

R-Bar^2 0.6606609

Uncentered R^2 0.9646239

Mean of Dependent Variable 293.54603409

Std Error of Dependent Variable 102.71344517

Standard Error of Estimate 59.83347988

Sum of Squared Residuals 300723.80646

Regression F(3,84) 57.4602

Significance Level of F 0.0000000

Log Likelihood -482.8775

Durbin-Watson Statistic 2.1098

Variable Coeff Std Error T-Stat Signif

************************************************************************************

1. Constant -21.77030860 29.47504196 -0.73860 0.46220778

2. LOTSIZE 0.00206771 0.00064213 3.22010 0.00182293

3. SQRFT 0.12277819 0.01323741 9.27509 0.00000000

4. BDRMS 13.85252186 9.01014545 1.53744 0.12794506

Test 1, p-value 0.28700

Test 2, p-value 0.04500