Examples / BOOTARMODEL.RPF |
BOOTARMODEL.RPF does a parametric bootstrap for an AR(1) model. This is one of the simpler types of parametric bootstraps since it's not unreasonable to assume the residuals can be shuffled as is.
The AR(1) model being estimated is on the first difference of log real GDP (that is, period to period log growth rate). It's simplest to use a copy of DY and use that so we don't have to worry about overwriting the original:
set dycopy = dy
This does the base linear regression on the copy of the data and saves the equation form as DYEQ. That will be used later to generate bootstrapped data.
linreg(define=dyeq) dycopy
# constant dycopy{1}
@regconfidence(confidence=.90)
This makes a copy of the residuals, since future LINREG's would overwrite %RESIDS.
set u = %resids
This is set to do 10000 draws. We'll save the AR coefficient from each draw (into the series ARDRAWS). This uses CLEAR to create that series over the required length.
compute ndraws=10000
clear(length=ndraws) ardraws
Inside the loop, the BOOT instruction is used to draw random entries from the regression range into the SERIES[INTEGER] named SHUFFLE. This is used to generate a bootstrapped set of residuals over the same range to taking the random entries out of the saved residuals series (U).
boot shuffle %regstart() %regend()
set ushuffle %regstart() %regend() = u(shuffle(t))
Some people recommend giving this a random sign to (in effect) symmetrize the source distribution of residuals. This is a form of wild bootstrap. You can do that by replacing u(shuffle(t)) with u(shuffle(t))*%ransign().
The bootstrapped data are generated by using FORECAST with the PATHS option, feeding in the USHUFFLE series as the shocks. Note that this uses the original pre-sample data.
forecast(model=dyeq,paths,from=%regstart(),to=%regend(),results=bootdata)
# ushuffle
This pulls the generated data out of the BOOTDATA(1) series (even though there is only one equation, the RESULTS option returns a VECT[SERIES] so BOOTDATA(1) is that one generated series). In this case, we are re-running the regression on the bootstrapped data and saving the AR coefficient (%BETA(2)).
set dycopy %regstart() %regend() = bootdata(1)
linreg(noprint) dycopy
# constant dycopy{1}
compute ardraws(draw)=%beta(2)
If we were using the bootstrap to do out-of-sample analysis, the BOOT instruction would be something like
boot shuffle fstart fend %regstart() %regend()
where FSTART and FEND are the forecast range, and the FORECAST would have FROM=FSTART and TO=FEND.
Outside the loop, we use STATISTICS(FRACTILES) to analyze the AR draws and output a bootstrapped version of a 90% confidence interval by displaying the 5% and 95%-iiles.
stats(fractiles) ardraws 1 ndraws
?"Bootstrapped 90% confidence interval" %fract05 "to" %fract95
Full Program
open data rgdp.xls
calendar(q) 1947:1
data(format=xls,org=columns) 1947:01 2012:04 rgdp gdp potential rcons rgovt rinv
*
set lrgdp = log(rgdp)
set dy = lrgdp-lrgdp{1}
*
* Make a copy of the data, which we will use in rebuilding the data
*
set dycopy = dy
linreg(define=dyeq) dycopy
# constant dycopy{1}
@regconfidence(confidence=.90)
set u = %resids
*
compute ndraws=10000
clear(length=ndraws) ardraws
*
do draw=1,ndraws
boot shuffle %regstart() %regend()
set ushuffle %regstart() %regend() = u(shuffle(t))
forecast(model=dyeq,paths,from=%regstart(),to=%regend(),results=bootdata)
# ushuffle
compute ardraws(draw)=%beta(2)
end do draws
*
stats(fractiles) ardraws 1 ndraws
?"Bootstrapped 90% confidence interval" %fract05 "to" %fract95
Output
The STATISTICS output and the bootstrapping confidence interval depend upon random numbers and so will not match exactly. However, they are pretty much uniformly shifted left from the standard symmetric confidence intervals off the original regression.
Linear Regression - Estimation by Least Squares
Dependent Variable DYCOPY
Quarterly Data From 1947:03 To 2012:04
Usable Observations 262
Degrees of Freedom 260
Centered R^2 0.1374933
R-Bar^2 0.1341759
Uncentered R^2 0.4704527
Mean of Dependent Variable 0.0078030099
Std Error of Dependent Variable 0.0098593715
Standard Error of Estimate 0.0091741124
Sum of Squared Residuals 0.0218827281
Regression F(1,260) 41.4469
Significance Level of F 0.0000000
Log Likelihood 858.3808
Durbin-Watson Statistic 2.0672
Variable Coeff Std Error T-Stat Signif
************************************************************************************
1. Constant 0.0049140140 0.0007229185 6.79747 0.00000000
2. DYCOPY{1} 0.3705749089 0.0575612050 6.43793 0.00000000
Label Coefficient Lower Upper
Constant 0.0049140 0.0037207 0.0061074
DYCOPY{1} 0.3705749 0.2755566 0.4655932
Statistics on Series ARDRAWS
Quarterly Data From 1947:01 To 4446:04
Observations 10000
Sample Mean 0.362002 Variance 0.003244
Standard Error 0.056955 SE of Sample Mean 0.000570
t-Statistic (Mean=0) 635.595791 Signif Level (Mean=0) 0.000000
Skewness -0.113550 Signif Level (Sk=0) 0.000004
Kurtosis (excess) -0.011269 Signif Level (Ku=0) 0.818126
Jarque-Bera 21.542102 Signif Level (JB=0) 0.000021
Minimum 0.136768 Maximum 0.557543
01-%ile 0.227220 99-%ile 0.488046
05-%ile 0.266149 95-%ile 0.453517
10-%ile 0.287614 90-%ile 0.434861
25-%ile 0.323859 75-%ile 0.400902
Median 0.363589
Bootstrapped 90% confidence interval 0.26615 to 0.45352
Copyright © 2025 Thomas A. Doan