ORDER Instruction

ORDER( options ) series start end list of series (optional)

Orders (sorts) a series or orders a list of series based upon values of one series. Optionally, it can construct a rank ordering of a series. Since it generally rearranges the elements of series, it may be a good idea to apply this to a copy of the data.

Note that the functions %SORT, %SORTC, %SORTCL and %RANKS can also be used to sort or rank data in a vector or matrix.

Parameters

series	series to sort or rank
start, end	range of entries to sort or rank. By default, the defined range of series
list of series	(optional) The listed series are reordered in parallel with series. Observations are kept intact across series and list of series and reordered as a group based upon the values of series. To reorder all current data series, use the option ALL.

Options

DECREASING/[NODECREASING]

If DECREASING, sort in decreasing order (by default, increasing order).

RANK=series for ranks [unused]

Saves the ranks of the elements. If you use this, the series itself isn't changed. The rank values run from 1 to the number of observations. Note that ties are assigned the average rank of the ties, so there is no guarantee that there will be any observation that will be assigned any particular rank value. (For instance, if there are 3 entries which achieve the smallest value, all three will get rank=2).

ALL/[NOALL]

sort all series currently in memory based on series

SMPL=standard SMPL option [not used]

Limits the sort to the observations for which the series or formula returns a nonzero or true value.

INDEX=SERIES[INTEGER] for index series [not used]

This saves the entry numbers which would sort the series, that is:

order(index=ix) x
set y = x(ix)

would make Y a sorted copy of X. The original series is left untouched. This is similar to RANK, except that INDEX returns actual entry numbers, rather than returning a ranking from 1 to N.

Variables Defined

%NOBS

Number of observations sorted/ranked (INTEGER)

Examples

open data states.wks

data(org=obs,format=wks) 1 50 expend pcaid pop pcinc

order(all) pop

This reads a cross section data set (U.S. state data) and immediately sorts the entire data set by POP.

order(ranks=popr) pop

linreg(smpl=popr<=22) pcexp

# constant pcaid pcinc

compute rss1=%rss,ndf1=%ndf

linreg(smpl=popr>=29) pcexp

# constant pcaid pcinc

This generates the series of ranks from the series POP into POPR. The bottom 22 will be included in the first LINREG, while the highest 22 (out of a data set with 50) will be included in the second one. Be sure to note that in case of tied values, entries are assigned the average rank (since the description of the RANKS option) so this doesn't guarantee that there will be 22 in either subsample.

set rx = x

set ry = y

* The order instruction below sorts rx and does a parallel sort on ry.

order rx / ry

This makes copies of X and Y into RX and RY and sorts the pair based upon the values of RX.

order(rank=xrank) xseries

order(rank=yrank) yseries

linreg yrank

# constant xrank

ranks XSERIES and YSERIES and regresses the Y-ranks on the X-ranks. The coefficient and t-stat on the XRANK coefficient give Spearman’s rank correlation test.

Missing Values and SMPL Series

Missing values are always put at the end of the sort, regardless of the direction of comparison.

If you use the SMPL option or have set a global SMPL series using the SMPL instruction, ORDER will put all excluded entries at the end of the sort. If you reorder your SMPL series, you need to redo the SMPL instruction.

Technical Information

ORDER uses a “Shell sort.” The sort time varies on the order of \(N^{1.5}\), where N is the number of data points. It should provide acceptable performance with virtually any typical RATS data set.

If you use the RANK option, ORDER assigns to all data points involved in a tie the average of the ranks. The smallest (if you are doing the default sort in increasing order) gets a rank value of 1.

On a sort, ORDER breaks ties by keeping data points in their original entry order. Thus, if you do ORDER(ALL) on series A, then ORDER(ALL) on series B, the result will be a data set sorted first on B and then A for tied values of B.