SHUFFLE option

SHUFFLE=SERIES[INTEGER]/FRML[INTEGER] with entry remapping

The SHUFFLE option is available on some instructions for simplifying the task of bootstrapping when the data set allows for direct rearrangement of the observations (if, for instance, the observations are assumed to be independent). It won't help if a parametric bootstrap is required in, for instance, a vector autoregression.

Before using SHUFFLE, you need to do a BOOT instruction to generate a SERIES[INTEGER] with the random remapping entries. If you have a five entry data set, and BOOT generates the sequence 5, 1, 4, 4, 2, then the STATISTICS instruction with that sequence input using SHUFFLE will compute the sample statistics of the input series at those five entries (with 4 repeated and 3 missing).

As with any bootstrapping, you will typically need to embed that within a loop, with a BOOT generating the remapping, the instruction with SHUFFLE generating statistics, and then some form of bookkeeping to process that.

Interaction with the SMPL option

If you use SHUFFLE combined with the SMPL option, the SMPL is processed first. For instance, in their textbook, Stock and Watson do a series of analyses which require systematically removing a random set of 10% of the data points. One way to do this is

set group = %clock(t,10)

boot(noreplace) shuffle

do leftout=1,10

and then use the options shuffle=shuffle,smpl=group<>leftout to choose the entries to use. The SMPL=GROUP<>LEFTOUT will restrict the analysis to the particular 90% of the data range which does not match the LEFTOUT value (for instance, when LEFTOUT is 1, it will drop the first 10%), then uses the SHUFFLE=SHUFFLE option to use randomly chosen entries. (The randomization is only done once, so each entry will be in one and only one decile).

Example

This is part of the BOOTFGLS.RPF program, which bootstraps a feasible estimator for GLS for heteroscedasticity correction. Note that SHUFFLE needs to be used on all three LINREG instructions that are part of the calculation.

boot shuffle

* Run the original regression using the bootstrapped sample

linreg(shuffle=shuffle,noprint) food

# constant income

* Generate ESQ using the original sample data (so it’s

* compatible with the original data)

set esq = log(%resids^2)

set z = log(income)

linreg(shuffle=shuffle,noprint) esq

# constant z

* Get the fitted values (again, this uses the original sample)

* and run the FGLS estimates.

prj vhat

linreg(spread=exp(vhat),shuffle=shuffle,noprint) food

# constant income