Data Transformations |
SET is the general data transformation instruction. RATS also has several special-purpose instructions for doing transformations, such as SEASONAL for creating seasonal dummies and DIFFERENCE and FILTER for difference and quasi-difference transformations. However, most of the time you will be using SET.
Below, we describe SET in more detail, and present a kit of standard data transformations. We also describe the various Wizards available for doing many of these types of operations.
The Instruction SET
The general form of SET is
set( options ) series start end = function(T)
In the function part of the SET instruction, you can use constants, scalar variables, other series, matrices, calls to built-in or user-defined functions, and any of the arithmetic and logical operators available in RATS. In its most basic form, SET defines one series as a simple transformation of another series. For example:
set loggdp = log(gdp)
sets each entry of the series LOGGDP to the log of the corresponding entry of GDP, using the built-in function LOG() (see "Arithmetic Expressions" for more on functions).
The start and end parameters are optional, and you can usually skip them when you are doing your initial transformations after DATA.
To set a series as a function of other variables, you use standard arithmetic notation:
set totalcon = durables + nondur + services
set scaled = resids/sigmasq
Trend Series
You use the variable T to create trend series and dummy variables based upon “time”. T is equal to the number of the entry being set, where the entry you specified on CALENDAR is given number 1. For instance, the following creates (in order) linear, quadratic and exponential (5% growth) trend series:
set trend = t
set trendsq = t^2
set exptrend = exp(.05*t)
Seasonal Dummies
RATS provides a special instruction SEASONAL for creating seasonal dummies. You use it in one of two forms:
seasonal seasons
seasonal(period=1948:2) february 1948:1 2009:12
The first creates SEASONS as a dummy for the last period of the year (4th quarter or December). By using SEASONS and its leads in a regression, you can get a full set of dummies without having to define a separate one for each period. With monthly data, the “lag field” SEASONS{-10 TO 0} covers dummies for February (lag “–10”, which is a 10-period lead) to December (lag 0).
The second creates FEBRUARY as a February dummy defined from 1948 to 2009. The PERIOD option specifies the first period to receive a value of 1 (February, 1948).
Other Dummies
Dummy variables are easy to construct using logical and relational operators, since these operators return the values zero or one.
set dummy = t>=1972:3.and.t<=1974:3
set large = pop>5000
set female = .not.male
The first example creates a dummy series called DUMMY using a logical expression. It stores the value 1 in entries where the logical expression is true, and a 0 in the other entries. In this case, the expression is true between 1972:3 and 1974:3 inclusive, so DUMMY is 1 for those entries and 0 elsewhere. The second sets LARGE to 1 when the corresponding entries of POP are greater than 5000, and 0 elsewhere. In the third example, entries of FEMALE are 1 when MALE is 0, and are 0 elsewhere.
Trend/Seasonals/Dummies Wizard
You can create many of the trend, seasonal, and dummy variables described above using Data/Graphics—Trend/Seasonals/Dummies.
Lag and Lead Transformations
Transformations involving lags or leads of a series can be written using the T subscript, lag notation, or a combination of both.
set pavge = ( price + price{1} ) / 2.0
set pavge = ( price(t) + price(t-1) ) / 2.0
set inflation = 400.*log( deflator/deflator{1} )
The first two are identical. They create PAVGE as the average of the current and first lag values of PRICE. The first uses lag notation, the second uses T explicitly. Which style you use is a matter of taste. The third example computes annualized growth rates of DEFLATOR (in percent) using the log difference approximation.
All of these transformations involve a lagged value, so the first entry will be set to missing: if PRICE is defined over 1922:1 to 1941:1, PAVGE will be defined only over 1923:1 to 1941:1.
Differencing
Simple differencing is, of course, easy to handle with SET:
set diffgdp = gdp - gdp{1}
RATS also offers the instruction DIFFERENCE for regular, seasonal, or fractional differencing (see "Entry Ranges" on the use of the /):
difference gdp / diffgdp
difference(sdiffs=1,diffs=1) gdp / ddsgdp
As noted earlier, you can also use Data/Graphics—Transformations or Data/Graphics—Differencing to create differenced series.
Growth Rates
There are several ways to compute growth rates or approximations to them. The first two SET instructions below compute (for quarterly data) annualized growth rates, the third and fourth compute period over period rates and the last computes the year over year growth.
set growx = 4.0*log(x/x{1})
set growx = ((x/x{1}) ^ 4 - 1.0)
set growx = log(x/x{1})
set growx = x/x{1} - 1.0
set growx = x/x{4} - 1.0
Data/Graphics—Transformations can also create simple growth rate series.
Benchmarking and Normalizing
These generally require one of the two options for SET; either FIRST or SCRATCH:
set(first=100.) capital 1920:1 1941:1 = .90*capital{1}+invest
computes CAPITAL from an investment series beginning with a value of 100.0 for the first period (1920).
set(scratch) gdp 1950:1 1998:4 = 1000.0*gdp/gdp(1975:3)
renormalizes GDP to take the value 1000 in 1975:3. You need the SCRATCH option because it is replacing the values of the GDP series that are needed in the calculation, and is using two entries of it (current entry and 1975:3) . See SET for more on these options.
The %IF Function—Conditional Expressions
The logical function %IF(x,y,z) returns the value y if x is non-zero, and returns z if x is zero. This is a very useful function. For example:
set w = %if(test<0, %na, w)
set testseries = %if(t<=1990:12, seriesone, seriestwo)
The first makes all entries of W for which the series TEST is negative equal to missing (%NA is how you represent the missing value in expressions). The other entries are not affected. The second stores the values of SERIESONE in TESTSERIES for all entries through 1990:12. Entries from 1991:1 on are taken from SERIESTWO.
Note that the %IF function only evaluates a single–valued expression each time it is called. It is the SET instruction itself that “loops” over the entries in the sample range, calling %IF once for each entry in that range.
Missing Values
SET propagates missing values through the formula. With only two exceptions (the %VALID(x) function and the %IF(x,y,z) function), any operation which involves a missing value returns a missing value.
SET also sets to missing any observation which involves an illegal operation, such as divide by zero and square root of a negative number.
set stockprice = %if(%valid(stockprice),stockprice,-999.0)
This replaces all the missing values of STOCKPRICE with the value –999.00. You might use this in preparation for exporting this series to a file format (such as ASCII) which doesn’t support special codes for missing values.
Copyright © 2025 Thomas A. Doan