BOOTARMODEL.RPF

BOOTARMODEL.RPF does a parametric bootstrap for an AR(1) model. This is one of the simpler types of parametric bootstraps since it's not unreasonable to assume the residuals can be shuffled as is.

The AR(1) model being estimated is on the first difference of log real GDP (that is, period to period log growth rate). It's simplest to use a copy of DY and use that so we don't have to worry about overwriting the original:

set dycopy = dy

This does the base linear regression on the copy of the data and saves the equation form as DYEQ. That will be used later to generate bootstrapped data.

linreg(define=dyeq) dycopy

# constant dycopy{1}

@regconfidence(confidence=.90)

This makes a copy of the residuals, since future LINREG's would overwrite %RESIDS.

set u = %resids

This is set to do 10000 draws. We'll save the AR coefficient from each draw (into the series ARDRAWS). This uses CLEAR to create that series over the required length.

compute ndraws=10000

clear(length=ndraws) ardraws

Inside the loop, the BOOT instruction is used to draw random entries from the regression range into the SERIES[INTEGER] named SHUFFLE. This is used to generate a bootstrapped set of residuals over the same range to taking the random entries out of the saved residuals series (U).

boot shuffle %regstart() %regend()

set ushuffle %regstart() %regend() = u(shuffle(t))

Some people recommend giving this a random sign to (in effect) symmetrize the source distribution of residuals. This is a form of wild bootstrap. You can do that by replacing u(shuffle(t)) with u(shuffle(t))*%ransign().

The bootstrapped data are generated by using FORECAST with the PATHS option, feeding in the USHUFFLE series as the shocks. Note that this uses the original pre-sample data.

forecast(model=dyeq,paths,from=%regstart(),to=%regend(),results=bootdata)

# ushuffle

This pulls the generated data out of the BOOTDATA(1) series (even though there is only one equation, the RESULTS option returns a VECT[SERIES] so BOOTDATA(1) is that one generated series). In this case, we are re-running the regression on the bootstrapped data and saving the AR coefficient (%BETA(2)).

set dycopy %regstart() %regend() = bootdata(1)

linreg(noprint) dycopy

# constant dycopy{1}

compute ardraws(draw)=%beta(2)

If we were using the bootstrap to do out-of-sample analysis, the BOOT instruction would be something like

boot shuffle fstart fend %regstart() %regend()

where FSTART and FEND are the forecast range, and the FORECAST would have FROM=FSTART and TO=FEND.

Outside the loop, we use STATISTICS(FRACTILES) to analyze the AR draws and output a bootstrapped version of a 90% confidence interval by displaying the 5% and 95%-iiles.

stats(fractiles) ardraws 1 ndraws

?"Bootstrapped 90% confidence interval" %fract05 "to" %fract95

Full Program

open data rgdp.xls

calendar(q) 1947:1

data(format=xls,org=columns) 1947:01 2012:04 rgdp gdp potential rcons rgovt rinv

set lrgdp = log(rgdp)

set dy = lrgdp-lrgdp{1}

* Make a copy of the data, which we will use in rebuilding the data

set dycopy = dy

linreg(define=dyeq) dycopy

# constant dycopy{1}

@regconfidence(confidence=.90)

set u = %resids

compute ndraws=10000

clear(length=ndraws) ardraws

do draw=1,ndraws

boot shuffle %regstart() %regend()

set ushuffle %regstart() %regend() = u(shuffle(t))

forecast(model=dyeq,paths,from=%regstart(),to=%regend(),results=bootdata)

# ushuffle

compute ardraws(draw)=%beta(2)

end do draws

stats(fractiles) ardraws 1 ndraws

?"Bootstrapped 90% confidence interval" %fract05 "to" %fract95

Output

The STATISTICS output and the bootstrapping confidence interval depend upon random numbers and so will not match exactly. However, they are pretty much uniformly shifted left from the standard symmetric confidence intervals off the original regression.

Linear Regression - Estimation by Least Squares

Dependent Variable DYCOPY

Quarterly Data From 1947:03 To 2012:04

Usable Observations 262

Degrees of Freedom 260

Centered R^2 0.1374933

R-Bar^2 0.1341759

Uncentered R^2 0.4704527

Mean of Dependent Variable 0.0078030099

Std Error of Dependent Variable 0.0098593715

Standard Error of Estimate 0.0091741124

Sum of Squared Residuals 0.0218827281

Regression F(1,260) 41.4469

Significance Level of F 0.0000000

Log Likelihood 858.3808

Durbin-Watson Statistic 2.0672

Variable Coeff Std Error T-Stat Signif

************************************************************************************

1. Constant 0.0049140140 0.0007229185 6.79747 0.00000000

2. DYCOPY{1} 0.3705749089 0.0575612050 6.43793 0.00000000

Label Coefficient Lower Upper

Constant 0.0049140 0.0037207 0.0061074

DYCOPY{1} 0.3705749 0.2755566 0.4655932

Statistics on Series ARDRAWS

Quarterly Data From 1947:01 To 4446:04

Observations 10000

Sample Mean 0.362002 Variance 0.003244

Standard Error 0.056955 SE of Sample Mean 0.000570

t-Statistic (Mean=0) 635.595791 Signif Level (Mean=0) 0.000000

Skewness -0.113550 Signif Level (Sk=0) 0.000004

Kurtosis (excess) -0.011269 Signif Level (Ku=0) 0.818126

Jarque-Bera 21.542102 Signif Level (JB=0) 0.000021

Minimum 0.136768 Maximum 0.557543

01-%ile 0.227220 99-%ile 0.488046

05-%ile 0.266149 95-%ile 0.453517

10-%ile 0.287614 90-%ile 0.434861

25-%ile 0.323859 75-%ile 0.400902

Median 0.363589

Bootstrapped 90% confidence interval 0.26615 to 0.45352