RATS 11
RATS 11

Instructions which operate on series need to know what entry range to use. With time series data, these are usually uninterrupted sets of dates (possibly with some dropped at the beginning or end due to lags or leads). Cross-section data more frequently can skip cases based upon some selection criterion. This page discusses the various ways of controlling the entry range.

 

Default Ranges (* and / parameters)

The entry range parameters are a pair: start and end. What the defaults are depend upon the instruction. For LINREG (which we will use as the examples), this is the maximum range allowed by the series involved (dependent variable and regressors allowing for lags and leads). So

 

linreg rate

# constant ip grm2 grppi{1}

 

which uses neither of the range parameters, will run over that maximum range. We can override this by providing both start and end:

 

linreg rate 1959:4 1992:4

# constant ip grm2 grppi{1}

 

runs the regression over the range from 1959:4 to 1992:4. If you want to control just one of the two, you can use the * character for the other:

 

linreg rate 1959:4 *

# constant ip grm2 grppi{1}

 

runs the regression from 1959:4 to whatever is the default for the end and

 

linreg rate * 1992:4

# constant ip grm2 grppi{1}

 

runs the regression from whatever is the default start (the earliest entry permitted by the data) through 1992:4.

 

If you want to use the default for both range parameters, you can just leave them out if nothing comes after them (as in the first example above). If there are trailing parameters, you either have to use * *, or you can also use the shorthand / to cover both. For instance, LINREG has a fourth parameter (for the generated residuals), so if we wanted to estimate this over the full range and save the residuals into the series U, we could use

 

linreg rate / u

# constant ip grm2 grppi{1}

 

The SMPL Instruction

The SMPL instruction lets you control the default range. It is an important instruction in EViews/TSP–like programs, but is less so in RATS because:

You can set explicit entry ranges on individual instructions where necessary.

You can use default ranges on most transformation and regression instructions.

 

SMPL is useful in situations where you want to run a sequence of regressions, forecasts, or other operations over a common fixed interval (other than the default range).

 

For instance, suppose you have data from 1922:1 through 1941:1, but you want to run two regressions over the period 1923:1 to 1935:1.

 

smpl 1923:1 1935:1

linreg foodcons

# constant dispinc trend

linreg foodprod

# constant avgprices

 

Once you set a SMPL, RATS uses the SMPL range as the default. To clear a SMPL, just issue a SMPL instruction with no parameters. We recommend that you do any preliminary transformations before you set a SMPL.

 

If you need to skip entries in the middle of a data set, use the SMPL option.

 

Using Entry Numbers Instead of Dates

You can use hard-coded entry numbers rather than dates. For example, given a CALENDAR of

 

calendar(m) 1959:1

 

the instruction

 

linreg rate 1 24

# constant ip m1diff ppisum

 

is equivalent to:

 

linreg rate 1959:1 1960:12

# constant ip m1diff ppisum

 

because 1960:12 is the twenty-fourth entry given the CALENDAR setting. With time series data, you will usually use the dates, but using entry numbers is sometimes easier.

 

Dates are actually handled as integer-valued variables, which means that you can combine dates and integer entry numbers in an expression. For example, another way to do the regression above is:

 

compute start = 1959:1

compute end = start+23

linreg rate start end

# constant ip m1diff ppisum

 

This means that you need to be careful about using the proper format for dates.  Suppose accidentally you left off the :1, and wrote:

 

linreg rate 1959 1960

# constant ip m1diff ppisum

 

RATS would try to run the regressions using entries one thousand nine hundred and fifty nine and one thousand nine hundred and sixty. Clearly not what you intended, so be sure to include the “:period” anytime you are referring to a date.

 

Exceeding the Default End Period

The default end period is not a binding constraint—you can define series beyond this default limit as needed by using explicit date/entry ranges.

 

Selecting Subsamples: The SMPL Option

The SMPL option is available on most instructions which operate on data series. It allows you to include and omit selected observations within the start to end range of the instruction. The formal description is:

 

smpl=SMPL series or formula

 

The SMPL series is a series or formula with non-zero (or logical “true”) values at the entries (between start and end) you want to include in the estimation, and zero values (false) at the entries to omit. It can be an existing series, or a formula like that used in a SET instruction. It’s usually a dummy variable series of some form. It may be a dummy in the data set, or one constructed for this particular purpose.

 

You can also set a SMPL series which applies to all instructions that support the SMPL option (this is not done very often). You do this with a version of the SMPL instruction:

 

smpl(series=SMPL series)

 

We prefer to use the SMPL options in most situations. If you use the SMPL instruction, you must remember to reset it when you are done with the analysis which uses the reduced sample.

 

Missing Values

RATS will automatically omit observations which have a missing value in the dependent variable or any of the regressors, so you don’t need a SMPL option for them. If you have a regression with lags, you could end up losing several data points since any observation which needs an unavailable lag of a variable will be dropped.

 

Skipping a Time Period

Suppose you want to leave the years 1942 to 1946 out of a regression over 1919 to 2009. You could construct a SMPL series for this with:

 

set notwar  1919:1  2009:1  = 1.0               Set all entries to one

set notwar  1942:1  1946:1  = 0.0               Set entries for war years to zero 

linreg(smpl=notwar)  depvar                    

# regressors

 

The second SET only applies to the restricted range from 1942:1 to 1946:1, and so won’t affect the values at other time periods. You could do the same thing with

 

linreg(smpl=t<1942:1.or.t>1946:1) depvar

# regressors

 

but the first way is easier to understand at a glance.

 

Subsample Based Upon Value

To run a regression for just those entries where a series exceeds some value or meets some similar criterion, use relational operators. This generates dummies for three subsamples based upon the series POP, then runs the regressions over those.

 

set small  = pop<2000

set medium = pop>=2000.and.pop<=6000

set large  = pop>6000

linreg(smpl=small)  depvar

# regressors

linreg(smpl=medium) depvar

# regressors

linreg(smpl=large) depvar

# regressors

 

Again, we could collapse these by using the SMPL option into three instructions like

 

linreg(smpl=pop<2000)  depvar

# regressors

 

again, however, at a loss in readability. A few extra keystrokes is generally worth it if is produces a clearer program.

 


Copyright © 2025 Thomas A. Doan