ORDER Instruction |
ORDER( options ) series start end list of series (optional)
Orders (sorts) a series or orders a list of series based upon values of one series. Optionally, it can construct a rank ordering of a series. Since it generally rearranges the elements of series, it may be a good idea to apply this to a copy of the data.
Note that the functions %SORT, %SORTC, %SORTCL and %RANKS can also be used to sort or rank data in a vector or matrix.
Parameters
|
series |
series to sort or rank |
|
start, end |
range of entries to sort or rank. By default, the defined range of series |
|
list of series |
(optional) The listed series are reordered in parallel with series. Observations are kept intact across series and list of series and reordered as a group based upon the values of series. To reorder all current data series, use the option ALL. |
Options
DECREASING/[NODECREASING]
If DECREASING, sort in decreasing order (by default, increasing order).
RANK=series for ranks [unused]
Saves the ranks of the elements. If you use this, the series itself isn't changed. The rank values run from 1 to the number of observations. Note that ties are assigned the average rank of the ties, so there is no guarantee that there will be any observation that will be assigned any particular rank value. (For instance, if there are 3 entries which achieve the smallest value, all three will get rank=2).
ALL/[NOALL]
sort all series currently in memory based on series
SMPL=standard SMPL option [not used]
Limits the sort to the observations for which the series or formula returns a nonzero or true value.
INDEX=SERIES[INTEGER] for index series [not used]
This saves the entry numbers which would sort the series, that is:
order(index=ix) x
set y = x(ix)
would make Y a sorted copy of X. The original series is left untouched. This is similar to RANK, except that INDEX returns actual entry numbers, rather than returning a ranking from 1 to N.
Variables Defined
|
%NOBS |
Number of observations sorted/ranked (INTEGER) |
Examples
open data states.wks
data(org=obs,format=wks) 1 50 expend pcaid pop pcinc
order(all) pop
This reads a cross section data set (U.S. state data) and immediately sorts the entire data set by POP.
order(ranks=popr) pop
linreg(smpl=popr<=22) pcexp
# constant pcaid pcinc
compute rss1=%rss,ndf1=%ndf
linreg(smpl=popr>=29) pcexp
# constant pcaid pcinc
This generates the series of ranks from the series POP into POPR. The bottom 22 will be included in the first LINREG, while the highest 22 (out of a data set with 50) will be included in the second one. Be sure to note that in case of tied values, entries are assigned the average rank (since the description of the RANKS option) so this doesn't guarantee that there will be 22 in either subsample.
set rx = x
set ry = y
*
* The order instruction below sorts rx and does a parallel sort on ry.
*
order rx / ry
This makes copies of X and Y into RX and RY and sorts the pair based upon the values of RX.
order(rank=xrank) xseries
order(rank=yrank) yseries
linreg yrank
# constant xrank
ranks XSERIES and YSERIES and regresses the Y-ranks on the X-ranks. The coefficient and t-stat on the XRANK coefficient give Spearman’s rank correlation test.
Missing Values and SMPL Series
Missing values are always put at the end of the sort, regardless of the direction of comparison.
If you use the SMPL option or have set a global SMPL series using the SMPL instruction, ORDER will put all excluded entries at the end of the sort. If you reorder your SMPL series, you need to redo the SMPL instruction.
Technical Information
ORDER uses a “Shell sort.” The sort time varies on the order of \(N^{1.5}\), where N is the number of data points. It should provide acceptable performance with virtually any typical RATS data set.
If you use the RANK option, ORDER assigns to all data points involved in a tie the average of the ranks. The smallest (if you are doing the default sort in increasing order) gets a rank value of 1.
On a sort, ORDER breaks ties by keeping data points in their original entry order. Thus, if you do ORDER(ALL) on series A, then ORDER(ALL) on series B, the result will be a data set sorted first on B and then A for tied values of B.
Copyright © 2025 Thomas A. Doan