SAMPLE Instruction

SAMPLE( options ) series start end newseries newstart

SAMPLE creates one series by selecting entries from another one either at regular intervals, or under control of a dummy variable (or expression).

Parameters

series	The series providing the values.
start, end	The sampled range of series. If you have not set a SMPL, this defaults to the defined length of series.
newseries	Resulting series, which will contain the extracted values. By default, newseries=series.
newstart	Start entry for newseries. By default start.

Options

The two options are mutually exclusive. If you don’t specify an option, SAMPLE uses INTERVAL=1.

INTERVAL=sampling interval [1]

Use this option for a regular sampling interval. SAMPLE will copy the entries:

start, start+interval, start+2*interval, start+3*interval, ...

into consecutive entries of newseries, beginning at entry newstart.

You can use this to change the frequency of a data series after it has been read into RATS. It is, however, rather clumsy compared to the options available on the DATA instruction, which has many choices for method of compaction and does the translation automatically.

SMPL=standard SMPL option[not used]

With this option, SAMPLE extracts only those entries of series which correspond to entries where the SMPL series or formula is non-zero or “true”.

SAMPLE with the SMPL option is the simplest way to filter observations out of a data set, if you actually need a compressed data set without gaps.

For instance, if you have a daily data set with some missing data and want RATS to skip back to the next valid data point when it needs a lag, you need to use SAMPLE to remove the missing values.

If you just want to skip certain entries when executing a particular instruction, you can use the SMPL option available on many instructions.

Variables Defined

%NOBS

Number of observations (INTEGER)

Examples

set filter_missing = %valid(sp500)

sample(smpl=filter_missing) sp500 / c_sp500

cal(irregular)

C_SP500 has the same set of values as SP500, except that the missing values (for non-trading days) have been removed. Note that, while SP500 is regular daily data, C_SP500 is not, so the CALENDAR is changed to IRREGULAR.

calendar(q) 1980:1

all 2010:4

open data quarters.rat

data(format=rats) / qseries

sample(interval=4) qseries 1980:1 * q1 1

sample(interval=4) qseries 1980:2 * q2 1

sample(interval=4) qseries 1980:3 * q3 1

sample(interval=4) qseries 1980:4 * q4 1

calendar(a) 1980:1

reads a quarterly series, breaks out separate series (q1, q2, q3 and q4) for each quarter, and then resets the calendar to annual. Note that it’s very important for the start and newstart parameters to be correct—it’s the start parameter that determines which period within the year that you get.

Here, the data set has missing values for some data points. The SAMPLE instruction extracts from the series ENGLAND the observations that aren't missing and creates the (shorter) series SCORES from it.

open data goals.tsm

data(format=free,org=columns) 1 57 england year

* This extracts the subsample where there is valid (non-missing) data

* for England.

sample(smpl=%valid(england)) england / scores

stats scores

This uses MVSTATS to get the maximum and minimum over a 21 period span, then extracts every 21st entry to create MINCOMP and MAXCOMP ("COMP" meaning compressed).

open data d-ibmln98.dat

data(format=free,org=columns) 1 9190 ibmlog

mvstats(min=mins,max=maxs,span=21) ibmlog

sample(smpl=%clock(t,21)==21) mins / mincomp

sample(smpl=%clock(t,21)==21) maxs / maxcomp

spgraph(footer="Figure 7.3 Maximum and minimum daily log returns (subperiod=21 days)",vfields=2)

graph(header="(a) Monthly maximum log returns")

# maxcomp

graph(header="(b) Monthly minimum log returns")

# mincomp

spgraph(done)

Notes

SAMPLE is not your best choice for drawing random subsamples. The instruction BOOT combined with SET is the correct way to do that. For instance, to draw a random sample of size 20 from a data series (X) with 100 observations:

boot entries 1 20 1 100

set draw 1 20 = x(entries)