RATS 10.1
RATS 10.1

Internally, RATS represents missing data with a special value. On most machines this is the value “+infinity.” On output, a missing value is denoted as “NA” for Not Available. In a RATS expression, this value is available as %NA. Thus,

 

set fixx = %if(x<0.0,%na,x)

 

will set FIXX to be missing for entries where X<0 and to X otherwise. You can test whether a value is missing using the %VALID function: %VALID(x) is 0.0 if X is missing and 1.0 otherwise. You can also use X==%NA.

 

RATS has special ways of handling missing values for each of the data formats.

 

Spreadsheet Files

The “+infinity” coding is the same as is used by the major spreadsheet programs for N/A’s. Thus, if a cell contains the explicit missing value function (NA() in Excel), RATS will read a missing value at that cell. When you export data to the spreadsheet, RATS passes the code value (as a number, not a function), so it will display as NA or #N/A on the spreadsheet.

 

RATS will also interpret blank cells within the spreadsheet as missing values.

 

Spreadsheet-Style Text (PRN) Files

Missing values are a problem if you have to save the data in a space or tab separated text format (PRN or TSD). You can’t simply leave the cell blank in the text file, as you can with an actual spreadsheet, because a blank cell is indistinguishable from the blanks used to separate data items. Instead, you’ll have to use one of the codings described in the next paragraph.

 

Text Files

RATS will accept the following as codes for missing values in text-format data files:

The characters “NA” or “#N/A” (upper or lower case)

A decimal point, followed by zero, one, or two non-numeric characters (for example, . or .NA)

Note, however, that you cannot use these within a RATS expression: use %NA for that.

 

RATS will interpret any block of non-numeric characters as a missing observation. However, if the characters don’t fit the description above, RATS will issue a warning message that it has encountered invalid input.

 

Numeric Codes

If you have data which use a specific numeric value (–999 or something similar) to indicate a missing value, use the option MISSING=missing value code on DATA. Whenever possible, use integers for missing value codes. The realities of numerical precision in the computing world mean that RATS may not be able to match exactly a decimal-valued code such as –999.99.

 

Suppose your data set uses –9999 for missing values. You could use something like the following:

 

data(format=prn,org=columns,missing=-9999) / sales revenue

 

This would read the series SALES and REVENUE, and convert any values of –9999.0 to missing values.

 

If you have several missing value codes (such as –999 = no response, –998 = invalid response), you have two options:

Edit the file and replace the two codes by a single code.

Read the data and alter it within RATS.

 

This is an example of the latter:

 

data(format=free,org=col) / income charitable mortgage

set income = %if(income==-999.or.income==-998, %na, income)

 

The %IF function tests whether the current entry of INCOME is equal to either of our two missing value numeric codes. If so, the function returns the %NA missing value code, which is then stored in INCOME. Otherwise, the existing value of INCOME is retained.

 

To apply this procedure to multiple series, we can use a DOFOR loop:

 

dofor i = income charitable mortgage

   set i = %if(i{0}==-999.or.i{0}==-998, %na, i{0})

end dofor

 

Note that the index variable I will actually be an integer variable containing the series number associated with the current series. As a result, we need to use the lag notation {0} (specifying the zero lag) on the right hand side of the SET instruction. This tells RATS to treat I as a series, not as an integer number.

 

Skipped Dates

Suppose that you have a (daily) data set in which holidays and other non-trading days are skipped. If you have no date information on the file, there is little you can do: you will have to treat it as an irregular time-series—omit the CALENDAR entirely, or use CAL(IRREGULAR).

 

If you do have dates on the file, DATA (and STORE with CONVERT) will code entries corresponding to the skipped dates as missing values. For example, consider the following portion of an XLS file, which skips the (U.S.) Thanksgiving holiday on November 23, 2000:

 

   DATE       SALES_DATA

          2000:11:20          3590.50

          2000:11:21          4256.05

          2000:11:22          2987.23

          2000:11:24          6799.87           

 

You might read the data with the following instructions:

 

calendar(d) 2000:1:2

open data sales.xls

data(format=xls,org=col) 2000:1:2 2000:12:31 sales_data

 

The data in the SALES_DATA series for the week of Thanksgiving would be:

 

          2000:11:20          3590.50

          2000:11:21          4256.05

          2000:11:22          2987.23

          2000:11:23              NA

          2000:11:24          6799.87

 

If you really want to skip the holidays without having gaps in your data (in the sense that you want to treat the data for November 22 and November 24 as adjoining entries), you cannot treat the data set as “Daily” because RATS insists that daily data be five days a week. If you use CALENDAR(IRREGULAR), RATS will ignore the dates on the data file and read the data from the file into consecutive entries. There is little difference between the two ways of handling the holidays, unless you do some type of time-series analysis that uses lags or leads.

 

If how you handle the skipped dates is important, you have two options:

1.Treat the data as an "irregular" time series where entries on either end of the skip are treated as consecutive.

2.Treat the data as a standard dated time series and do something to patch over the missing values.

In practice, both of these have been used. (Note that the typical 5-day-a-week "daily" series is already treating Friday and the following Monday as consecutive entries).

 

If you choose #1 and you have dates on the file, you can use "mapped dates" which will allow you to use date fields on both input and output. For instance, in the example above, you could use

 

data(format=xls,org=columns,julian=date) / sales_data

cal(julian=date)

which defines an irregular CALENDAR scheme with dates coded into the series DATE.

If you choose #2, the following will patch over a missing value with the previous value (which would make no sense with the sales data, but might if you have, for instance, price data):

 

set patched = lastprice=%if(%valid(price),price,lastprice)

 

This will use the value in the series PRICE if PRICE isn’t missing, and will use the last non-missing value if it is.

 


Copyright © 2025 Thomas A. Doan