Data formats |
These are the different data/output formats that are included with RATS. Some are input only, some output only. The first column is the name used in options (for instance, FORMAT=RATS selects the RATS datafile format); in general only the first three characters matter for these, except for XLS (older format Excel®) vs XLSX (newer format Excel). Below the table we have a breakdown of the formats into broad categories in case you have options for how to manage it.
Binary is the internal format used on a computer for representing numbers. This is not portable and should never be used as an output format. |
|
Comma Separated Values format, this is a text file with fields delimited with commas. Character fields with embedded commas need to be enclosed in quotes—either single or double. |
|
This is the (proprietary) time series database format that was formerly known as "Citibase". |
|
This is the proprietary time series database format from the Center for Research in Security Prices. |
|
This is the database format used by dBase and related programs |
|
This is "Data Interchange Format", which is a portable format for the content of spreadsheets. |
|
This is Stata's® DTA format. |
|
This is the proprietary time series database format for the FAME software provided by SunGard. |
|
This is a text format which allows for user-specified formats on input or output. The option is FORMAT="(Fortran format string)" |
|
This is the St. Louis Federal Reserve’s online database |
|
This is an unstructured text format |
|
This is the proprietary time series database used by Haver Analytics |
|
This generates a table for output in HTML format for use on a web site |
|
This is the binary data format used by Matlab. |
|
This is for SQL queries, requiring Open Database Connectivity (ODBC) to be set up. |
|
This is a portable time text-based time series format. |
|
This is a text file with fields delimited with spaces, tabs or commas. |
|
TeX is a table-oriented text format for output (only) which renders the contents as a TeX tabular environment. |
|
RATS format is specially designed to deal with time series data. It allows random access to a large number of series. |
|
This is Rich Text Format for including tables in Word and related programs |
|
This is a text file with fields delimited with tabs |
|
WF1 is the EViews(tm) workfile format. |
|
This is the native format used by the Lotus 1-2-3 spreadsheet and successor programs. |
|
XLS is the standard format used by Excel through the 2003 version. |
|
XLSX is the standard format for Excel 2007 and later |
Here, we look at file formats specifically designed to handle time series data. Data are organized by named series. Each of these series has its own calendar scheme and range of data. These are usually fairly complicated proprietary formats, since they have to be able to code up many types of date schemes, and must allow for very large amounts of data.
The most important of these is RATS format (FORMAT=RATS). A “text” version of the RATS format is the RATS Portable Format, useful mainly for archiving data in a “human-readable” form. We use FORMAT=PORTABLE for describing this format.
FRED® (Federal Reserve Economic Data) is a (free) on-line database provided by the St. Louis Federal Reserve Bank (research.stlouisfed.org/fred2). This requires an active internet connection. You can read series in directly with FORMAT=FRED, but you also have the option of browsing the database interactively—select the Data/Graphics—Data Browsers—FRED(online) operation. This displays a list of the main database categories in a new window. Double-click on a category (or sub-category) to see a Data File Window with the series available in that category.
The other formats listed below are available only in the Professional version of RATS, and some aren’t included on certain platforms. These are all proprietary formats, so our ability to support them depends upon the type of support offered by the owner of the format. You will also need a subscription to a database or service provided by the owner in order to use them. These other formats are:
Citibase/DRI/Global Insight native format. We call this FORMAT=CITIBASE (which is its legacy name). This is available on all platforms.
CRSP® (Center for Research in Security Prices) data, available from the University of Chicago’s Booth School of Business (www.crsp.com). We call this FORMAT=CRSP. It’s available for Windows and some UNIX systems.
Fame® is the native format for the Fame database management program provided by SunGard. We call this FORMAT=FAME. This is available on Windows and certain UNIX systems.
Haver DLX is the native format for Haver Analytics (www.haver.com). We call this FORMAT=HAVER. It’s available on Windows only.
Labeled Tables and Spreadsheets
These form a rectangular table of data with each column representing a series and each row representing an observation. Each column is labeled with the series name. Each series has at least nominally the same range, and same date scheme (if any) though there might be missing values in some of the series.
The most commonly used of these is one of the Excel formats, though they also can be in the third, less structured category. For formats through Excel 2003, use the option FORMAT=XLS. For Excel 2007 and later, use FORMAT=XLSX. You can pull data off more than one worksheet within an Excel workbook (using the SHEET option on DATA), but can only access one worksheet per DATA instruction—use additional DATA instructions to get data from more than one worksheet.
RATS accepts several “delimited” text formats. FORMAT=PRN will accept data fields which are separated by commas or “white space” (spaces or tabs). For writing data, you can also use FORMAT=CSV (comma separated values) or FORMAT=TSD (tab separated data) to get specific field separators.
EViews® workfiles fall into this category. RATS can process data series (but not other objects). This uses FORMAT=WF1.
DTA is the native format for Stata® data files. Use FORMAT=DTA for that.
If you have the Professional level of RATS, you can also read data using SQL queries on databases which support ODBC (such as Oracle and Access). This is a bit more complicated than other formats since you have to provide the SQL query that sets up the table of interest. This is FORMAT=ODBC.
DIF is a (rather bulky) text format for transmitting the content of a simple spreadsheet in a text, rather than binary form. It’s rarely used now, except as a copy-paste format for data to and from spreadsheets. It uses FORMAT=DIF.
WKS is an older spreadsheet format used by the Lotus spreadsheets. It’s perfectly adequate as a storage format for data, and quite a few legacy data sets are saved in it. As with Excel, it can also be in the third category. This uses FORMAT=WKS.
DBF is the database format used by dBase and compatible programs. It uses FORMAT=DBF.
Unlabeled Text and Spreadsheets
These either don’t have labels for individual series, or have “labels” which have illegal characters for RATS series names (such as spaces or parentheses).
The most important of these is free-format, which is just a text file with numbers, delimited with commas or “white space” (spaces, tabs, line breaks). While we would never recommend saving your own data in this, there are a large number of existing data files done in such an unstructured way. Also, if you ever have to scan data out of a book, you will likely end up with this. This uses FORMAT=FREE.
Another unlabeled text “format” is the collection of Fortran formats. This is likely only of value if you have a very old text data file which, in order to keep the size down, squeezed data into undelimited fields. (For instance, first five positions are the first number, second five are the next number).
MATLAB is the native data format for the MATLAB programming language. This could also fall into the “Time Series Databases” category, if each series is a separate one-column matrix. If, instead, you have a single matrix with multiple columns forming your dataset, it falls into this group. This uses FORMAT=MATLAB.
The least-recommended of all the formats is native binary. Never save data in this format—there’s no identifying information on the file, you can’t read the data without the proper program and it isn’t necessarily portable from one type of computer system to another. It’s FORMAT=BINARY.
Any of the “spreadsheet” formats (XLS, XLSX, WKS, DIF, PRN) can also be treated this way if you have extra comment lines or missing or unusable labels.
Copyright © 2025 Thomas A. Doan