Text Files

We use the term free format to refer to text files that contain only numbers, with no series names, date labels, or other alphanumeric labels. Numbers on the file can be separated by blanks, tabs, commas, or “new lines”.

You can use the Data/Graphics>Data (Other Formats) operation to read data from free format files. Just select “Text Files” or “Comma Delimited” from the file-type list before selecting the file you want to read. When you complete the Data Wizard dialog box, RATS will prompt you for a name for each series being read in.

If typing in the DATA instruction directly, use FORMAT=FREE and the appropriate ORG option (ORG=COLS or ORG=ROWS), and include a list of the names you want to assign to the series. See "Reading Free Format Files" below for more.

PRN and Other Delimited Formats

PRN format refers to a text file structured like a spreadsheet. The name dates back to the days of Lotus 123, where it described a printable (text) version of a spreadsheet. To be read as a Labeled Table, PRN files need to have data structured as shown under "Spreadsheets and Delimited Text Files". Series should be arranged in columns or rows, and the file must contain names for each series. PRN files can also contain dates, in the form of date-format strings such as “yyyy/mm/dd” or “yyyy-qq”. Note that you can’t use any type of numerical coding for the dates. Numbers on the file can be separated by blanks, commas, tabs, and/or carriage-returns. The columns don’t need to be nicely aligned since it’s the delimiters that determine where one column ends and the next starts.

You can read the file using the Data/Graphics>Data (Other Formats) operation, or by typing in a DATA instruction with FORMAT=PRN and the appropriate ORG option.

RATS also offers two related choices: FORMAT=CSV for comma-separated files and FORMAT=TDF for tab-delimited files.

For reading data, CSV, TDF, and PRN are interchangeable. Regardless of which you use, RATS will accept commas, tabs, or spaces as separators. For creating a file with COPY, the option will determine the type of separator used: commas for CSV, tabs for TDF, and spaces for PRN.

Free Versus PRN

Free format files can be very convenient if you want to type in a short data set, because you can create them with any editor that can save files as plain text. They are also the most natural format for data scanned in from printed material.

However, because these files do not contain series labels or date information, they are not good for long-term use, and RATS cannot read them with much flexibility. Usually, you will want to convert free-format files to RATS format files, or to one of the spreadsheet formats.

PRN format is generally more useful and reliable. The presence of series names reduces the chance of mistakes identifying data, and also gives you the option of only reading in selected series. The ability to include date labels is clearly an advantage for dated time series data.

If you have a text file that does have variable labels, or other non-numeric characters you could:

•read the file using FORMAT=PRN. This allows RATS to process the series labels and dates (if any).

•import the data into a spreadsheet and read that file into RATS. Spreadsheet programs have very sophisticated “parsing” dialogs for taking text and dividing it into spreadsheet cells.

If you have an otherwise well-formatted text file which does not include series labels, you could:

•edit out all the non-numeric characters and read the file as FORMAT=FREE.

•edit the file to add series labels and read it as FORMAT=PRN.

Note that you can use the TOP, LEFT, BOTTOM and RIGHT options on DATA to leave out parts of the files that you don't want treated as part of the data set. With text files, TOP is particularly useful, since files often have general descriptions at the top. LEFT and RIGHT tend not to be as useful since those count "columns" based upon the delimited text. If you a text file with complicated rows, you may be best off using a spreadsheet "Text to Columns" operation to parse it out and then save as a spreadsheet.

Reading Free Format Files

When reading a free format file, RATS reads series line by line, according to the following:

•If your data are organized by row, each variable must begin on a new line. The data for a given series can, however, extend over more than one line—just make sure that the data for the next series begins on a new line.

If DATA does not find enough entries on a line to fill a series, it automatically moves on to the next line and continues reading numbers.

Extra rows of spaces are fine (for instance, separating data for two series). FORMAT=FREE will just keep scanning until it hits the next set of numbers.

•If your data are organized by column, the data for each new observation must begin on a new line. As above, data for a particular observation can extend over multiple lines.

If DATA does not find enough entries on a line to fill all the series for an observation, it automatically moves on to the next line and continues reading numbers. When it has read data for all series for that observation, it drops to the next line to start reading in the next observation.

•RATS interprets as a missing value the characters “NA” or “#N/A” (upper or lower case), or a period followed by zero, one, or two non-numeric characters. If it encounters any other non-numeric characters, RATS will interpret them as a missing observation and display a warning message.

Troubleshooting

Free format allows data for a single observation (or a single variable) to cover several lines. The disadvantage of this is that it becomes more difficult to pinpoint errors. If you forget one number early in the file, DATA will automatically pull in the next line to fill the omitted value, and throw the data off.

If you get the error “Unexpected end of file...” when reading with FORMAT=FREE, it means that RATS reached the end of the data file before it filled all of the data series. To determine what happened, do the following:

•Check your ALLOCATE setting or the start and end parameters on DATA to make sure they are set correctly. Note that if, for example, you have ALLOCATE 2018:5:14, you can do ?2018:5:14 to see how many data points are implied by the ALLOCATE range.

•Make sure that you have the proper ORGANIZATION option.

•Check that the data file has the number of observations and variables that you expect, at least at first inspection.

•If all seems to be correct and it’s a small enough file, you can do a quick check for typographical errors.

If all this fails to locate the problem, you will have to let DATA help you find the problem. For illustration, suppose you have a file that looks like this:

1,2,3

10.20,30

100,200,300

1000,2000,3000

This is supposed to have 4 observations on each of three series, but the second line has a decimal point between the 10 and the 20, where it should have a comma, so the line only contains two values (10.20 and 30) rather than three. If we read this with

data(org=col) 1 4 first second third

we will get the error message,

Unexpected end of file while processing line 5. (series FIRST entry 4).

We can tell the following from this:

1.DATA thinks it needs a fifth line to read the requested data. Our data set is supposed to have four lines—this tells us that, in fact, the file has as many lines as we think.

2.RATS was trying to read the fourth observation from a fifth line, so for some reason we are precisely one line off at the end of the file.

We can then examine the values of the series with PRINT, looking for values which have ended up in the wrong series, or locating where a series gets off sequence.