DATA Instruction

DATA( options ) start end serieslist

DATA reads data series from an external file into working memory, and is usually preceded by an OPEN DATA command, although this is not required. We have a whole section (Dealing with Data) devoted to the use of the DATA instruction. The corresponding instruction for writing data is COPY.

Wizard

The Data/Graphics>Data Wizards generate the appropriate sequence of CALENDAR, OPEN DATA, and DATA instructions to read data from a file.

Parameters

start, end

Range of entries to read. If you have not set a SMPL, this defaults to the standard workspace. If you have not set that, and the length of the data set is clear from the contents of the data file, DATA uses that and sets it as the workspace length.

serieslist

The list of series names or numbers you want DATA to read from the file. If you omit the list, DATA reads all of the series on the file. You can take advantage of this with any formats except FREE, BINARY or FORTRAN format (as they have no labels).

You can use series<<fileseriesname parameter fields to map a series/column name on an original file to a (valid) RATS name. This can be used to handle source file names which are too long, improperly formatted (for instance, having spaces), conflict with RATS reserved names (such as INV or T), or simply not easily understood. If fileseriesname has spaces or punctuation, enclose it in quotes (single or double).

You can use a VECT[SERIES] as part or all of the list. This would typically be used if you're reading from a "unlabeled" file.

Options

FORMAT=[FREE]/BINARY/CSV/CITIBASE/CRSP/DBF/DIF/DRI/DTA/FAME/FRED/HAVER/MATLAB/ODBC/PORTABLE/PRN/RATS/TDF/WF1/WKS/XLS/XLSX/'(FORTRAN format)'

This describes the format of your data set. RATS will not attempt to determine the format from the file name or extension—you must use the correct FORMAT option.

ORGANIZATION=COLUMNS/[ROWS]/MULTILINE

This tells DATA whether the data series run down the file in columns (one series per column), across it in rows (one series per row), or blocked into multiple lines per series. The ORG option isn’t needed for most formats, since only text files and spreadsheets allow for the different arrangements.

ORG=MULTILINE can be used to read data for a single series (only) which spans more than one row in a spreadsheet (or similar file format). If you have more than one series on a file blocked this way, use separate DATA instructions with the TOP and BOTTOM options to isolate the information for each series.

Note that the older terminology of ORG=VARIABLE/OBSERVATION is still supported. OBSERVATION corresponds to COLUMNS and VARIABLE corresponds to ROWS.

UNIT=[DATA]/INPUT/other unit

RATS I/O unit from which data are read. With the default option, DATA reads the data from the external data file. Use UNIT=INPUT when you want to enter data directly into the input file.

SHEET="worksheet name" [unused]

When reading an Excel file with multiple worksheets or a MATLAB file with several data series on a single matrix, you can use SHEET to identify the which you want to read. DATA reads the first by default.

SKIPLINES=number of top lines to skip [0]

TOP=top line to process [1]

BOTTOM=top line to process [last]

LEFT=leftmost column to process [1]

RIGHT=rightmost column to process [last]

[LABELS]/NOLABELS

You can use these if you have comments or other non-data information on a text or spreadsheet file. SKIPLINES or TOP can be used to skip over information at the top of the file, and the others can be used to prevent reading marginal comments at other locations. If you have a spreadsheet file with non-standard (or no) labels, you can use NOLABELS to indicate that the series labels are not on the file, but will be provided by the list of series.

SQL="SQL command to read data" [unused]

QUERY=INPUT/Other unit

When using FORMAT=ODBC, use the SQL option to supply a relatively short (255 characters or fewer) SQL query, either as a literal string, or a variable of type STRING defined earlier.

For a more complex SQL query, use the QUERY option. With QUERY=INPUT, RATS reads the SQL commands from the lines following the DATA instruction in the input window (or input file in batch mode). With QUERY=unit, RATS will read the query from the text file associated with the specified I/O unit (opened previously with an OPEN instruction). In either case, use a ";" symbol (which tells RATS to begin a new instruction) to signal the end of the SQL string. See OPEN and RATS I/O Units for more on I/O units.

COMPACT=[AVERAGE]/GEOMETRIC/SUM/MAXIMUM/MINIMUM/FIRST/LAST

SELECT=number [unused]

These options, which work with most dated file formats, set the method DATA will use to convert data from higher frequency to lower, such as monthly to quarterly.

MISSING=missing value code [unused]

Use MISSING if you use a numeric code in your data to represent missing observations. For instance, MISSING=-999 will translate observations with value –999 to the RATS missing value.

BLANK=[ZERO]/MISSING

This applies only with FORTRAN formats. If an area where DATA expected a number is blank, it treats it either as a zero or as a missing value, depending upon your choice for this option.

DATEFORMAT="date format string"

This can be used if dates on the file are text strings in a form other than year (delimiter) month (delimiter) day or year (delimiter) period. In the date format string, use y for positions with the year, m for position with the month and d for positions with the day. Include the delimiters (if any) used on the file. Examples are DATEFORMAT="mm/dd/yyyy" and DATEFORMAT="yyyymmdd".

JULIAN=SERIES to fill with Julian date information from file [not used]

This creates a series from the date information on the file which has the mapping from the dates to the Julian date coding used by RATS. This can be used to create a mapped irregular date scheme.

CALENDAR=Coded CALENDAR data array [not used]

In general, DATA does a fairly good job of detecting the frequency and dates of the information on a data file which has date information. However, if you have a very short data set, or a data set with large gaps, it might fail to determine the date scheme that you intend. The CALENDAR option allows you to force a particular (encoded) date scheme. The %CALENDAR() function can be used to take the current CALENDAR scheme and save the information into a VECT[INTEGER]. The most typical use would be

DATA(CALENDAR=%CALENDAR(),other options) ...

which forces the data to be interpreted as having the CALENDAR scheme of the current workspace. This option was added with version 9.1.

REVERSE/[NOREVERSE]

If your data are in time-reversed order (starting with the most recent at the top or left), use REVERSE to have it switched to the standard time series order when read in. Note that if the file has usable dates which run in the reversed order, DATA will automatically detect that and correct the ordering, so this is only necessary if you have a data set with reversed sequence and no usable dates.

VERBOSE/[NOVERBOSE]

Use VERBOSE if you want DATA to list the name and comments (if any) of each series it reads, along with other information about the file. You can use VERBOSE to verify that DATA is reading date information on your data file properly—this is particularly important when converting from one frequency to another.

Examples

open data states.xls

data(org=col,format=xls) 1 50

reads all series from the Excel file STATES.XLS. The file is organized by columns.

cal(m) 1955:1

open data brdfix.dat

data(org=row) 1955:1 1979:12 ip

reads data for the series IP from the file BRDFIX.DAT. This is a free-format file, organized by row. Note, that unlike the “labeled” formats, like XLS and RATS, we are free to give the series any name we wish.

cal(a) 1920:1

all 1945:1

open data klein.prn

data(org=col,format=prn) 1920:1 1941:1 consumption $

profit privwage invest klagged production govtwage govtexp taxes

reads data for nine series in labelled ASCII (PRN) format over the range 1920:1 to 1941:1.

open data haversample.rat

calendar(a) 1982

all 2006:1

data(format=rats,compact=max) / djmax<<spdji

data(format=rats,compact=min) / djmin<<spdji

data(format=rats,compact=last) / djlast<<spdji

reads data from the series SPDJI which are monthly on the data file into an annual CALENDAR, taking the maximum in each year (to create DJMAX), the minimum (to create DJMIN) and the final (to create DJLAST).

cal(m) 1970:1

data(format=fred) / inflflex<<COREFLEXCPIM157SFRBATL

pulls up-to-date data from FRED for inflation from the flexible price CPI. Since the FRED name is too long to be used as a RATS name, the << is used to pull the data into a legal (and more descriptive) RATS name. The working series within RATS will be known as INFLFLEX.

cal(m) 1991:1

open data "https://api.statistiken.bundesbank.de/rest/download/BBDE1/M.DE.Y.AEA1.A2P300000.F.C.I15.L?format=csv&lang=en"

data(format=cdf,nolabels,org=col,skiplines=10,right=2) 1991:1 * neword

This reads the new manufacturing order index from the Deutsche Bundesbank, from 1991:1 to the most recent value. This comes through as unlabeled data once the 10 header lines are skipped. It is given the name NEWORD.

cal(d,save=forcedaily) 1990:1:1

cal(m) 1990:1
open data MPshocks.xlsx
data(format=xlsx,org=col,compact=sum,calendar=forcedaily) 1990:1 2019:12 FFRfactor FGfactor LSAPfactor

The data file here is a scattering of monetary policy events labeled with their dates, but the desire is to produce a monthly series which has the sum of those events across each month. The CALENDAR option on DATA forces the instruction to regard the input data as (five day a week) data, so COMPACT=SUM gives the sum of any non-zero values in the covered month. Note that this will give a missing value for any month which has no events.