Page 1 of 2

Reading Panel Data Set

Posted: Thu Nov 02, 2017 4:54 am
by bok1234
Dear Mr. Doan,

I am trying to estimate panel data. Please see an attached file.
This is a data set of variable1 from year 1990 to 2016 (annual data).
There are many blanks and lots of disconnections in many firms’ time series.

I want to read this sheet wholly without any partition or decomposition.
Is this possible?
I have 10 more datasets (addtional 10 variables) like this type,
which will be used for making index and testing models,
so decomposing those one by one should be disaster to me.
For your information, I am using WINRATS PRO 9.1.

One more question. should I affix alphabet letter, such as ‘c’,
to my firm code to use each of them as individual variable?

Re: Reading Panel Data Set

Posted: Thu Nov 02, 2017 1:55 pm
by TomDoan
That will read that data set as a single panel data series. Do you have the same individuals in each (in the same order)?

OPEN DATA "C:\TEMP\variable.xls"
CALENDAR(PANELOBS=27,A) 1990:1
DATA(FORMAT=XLS,NOLABELS,ORG=MULTIPLEROWS,TOP=2,LEFT=3) 1//1990:01 27684//2016:01 VARIABLE

Re: Reading Panel Data Set

Posted: Thu Nov 02, 2017 9:49 pm
by bok1234
Thank you, Mr.Doan.
The code read my data successfully.

Unfortunately some of 11 data files (11 variables) do not have exact same orders.
In this case, should I affix some alphabet letter, such as 'c' or 'f', to firm number and then rearrange them accending or descending order? And then what?
I guess that this might be EXCEL work problem, not RATS coding work.

Re: Reading Panel Data Set

Posted: Thu Feb 22, 2018 9:00 am
by Marcus
I'm trying to read in the attached file with data for 182 countries for the period 1970-2014.
Some variables, such as EMP, have missing values for the first cross-section unit.
When I look at it in the Series window it says that it begins in 1991:1 instead and runs to 2035. For the series CTFP, five cross section units seems to have been dropped.
I tried several options for missing values in the data command but nothing worked.
How can I read this file so the whole panel setting with 182 countries and 45 years is retained?

Re: Reading Panel Data Set

Posted: Thu Feb 22, 2018 1:31 pm
by TomDoan
Seems to work fine when I do it.

OPEN DATA "C:\TEMP\pwt90forrats_7014.xlsx"
CALENDAR(PANELOBS=35,A) 1970:1
DATA(FORMAT=XLSX,ORG=COLUMNS,SHEET="Data") 1//1970:01 234//2004:01 c_number year rgdpe rgdpo pop emp avh $
hc ccon cda cgdpe cgdpo ck ctfp cwtfp rgdpna rconna rdana rkna rtfpna rwtfpna labsh delta xr pl_con $
pl_da pl_gdpo csh_c csh_i csh_g csh_x csh_m csh_r pl_c pl_i pl_g pl_x pl_m pl_k

Re: Reading Panel Data Set

Posted: Thu Feb 22, 2018 2:21 pm
by Marcus
No it's 45 observations per cross section unit.
So the last date should be 2014:1. You get 2004:1 and also 234 cross section units instead of 182

Re: Reading Panel Data Set

Posted: Tue Feb 27, 2018 12:19 pm
by TomDoan
Sorry. But with that corrected, I'm not sure I see what issue you're having.

Re: Reading Panel Data Set and Drawing XY (scatter) Graph

Posted: Wed Aug 15, 2018 11:41 pm
by bok1234
Dear Mr.Doan,

I am trying to draw XY graph, which X is variable_A and Y is variable_B.
Both variable_A and variable_B are panel data set.
Had used the same code as you taught me before, I succeeded to read 2 panel data sets
but fail to draw the graph. I want you to help me to fix the problem.


OPEN DATA "F:\variable_A.xls"
CALENDAR(PANELOBS=27,A) 1991:1
DATA(FORMAT=XLS,NOLABELS,ORG=MULTIPLEROWS,TOP=2,LEFT=2) $
1//1991:01 28238//2017:01 VARIABLE_A

OPEN DATA "F:\variable_B.xls"
CALENDAR(PANELOBS=27,A) 1991:1
DATA(FORMAT=XLS,NOLABELS,ORG=MULTIPLEROWS,TOP=2,LEFT=2) $
1//1991:01 28238//2017:01 VARIABLE_B


* print(dates) / variable_A variable_B

scatter 1
# variable_A variable_B

Re: Reading Panel Data Set

Posted: Thu Aug 16, 2018 8:02 am
by TomDoan
I'm not sure what you expect to see out of a scatter plot, but some of your data are on an almost absurd scale relative to most of it:
Series             Obs       Mean        Std Error      Minimum       Maximum
VARIABLE_A        334684       685.812    130009.809   -894921.550  52181700.000
VARIABLE_B        258081       379.537     35315.857      -100.000  10115644.440
Put on to a scale that has to go up to 52,000,000, almost all the data will end up on one dot.

Re: Reading Panel Data Set

Posted: Thu Aug 16, 2018 8:32 pm
by bok1234
OK. I suggest less absurd data set as attached. I succeeded in drawing scatter plot graph with the code as below. Now I want to draw 3 graphs at one page with 3 grouped periods - 1991~1999, 2000~2008, 2009~2017. Please let me know how to draw them.

OPEN DATA "F:\variable_A3.xls"
CALENDAR(PANELOBS=20,A) 1991:1
DATA(FORMAT=XLS,NOLABELS,ORG=MULTIPLEROWS,TOP=2,LEFT=2) $
1//1991:01 20//2017:01 VARIABLE_A

OPEN DATA "F:\variable_B3.xls"
CALENDAR(PANELOBS=20,A) 1991:1
DATA(FORMAT=XLS,NOLABELS,ORG=MULTIPLEROWS,TOP=2,LEFT=2) $
1//1991:01 20//2017:01 VARIABLE_B


*** print(dates) / variable_A variable_B


scatter 1
# variable_A variable_B

Re: Reading Panel Data Set

Posted: Thu Aug 16, 2018 8:34 pm
by TomDoan
Use SMPL options with the %PERIOD(T) function. For instance:

scatter(smpl=%period(t)>=1991:1.and.%period(t)<2000:1) 1
# variable_A variable_B

Re: Reading Panel Data Set

Posted: Thu Aug 16, 2018 9:04 pm
by bok1234
Thank you, Mr.Doan, it works.
Now 20 individuals in my panel data sets should be divided 4 groups - 1 to 5, 6 to 10, 11 to 15, 16 to 20.
Does SCATTER have option for this?

Re: Reading Panel Data Set

Posted: Thu Aug 16, 2018 9:38 pm
by TomDoan
This is covered in the manual. https://estima.com/docs/RATS%209%20User ... f#page=431. Use the %INDIV function for individuals.

Re: Reading Panel Data Set

Posted: Thu Aug 16, 2018 11:12 pm
by bok1234
Yes, %INDIV function works perfect. Thank you.

Re: Reading Panel Data Set

Posted: Fri Aug 24, 2018 5:29 pm
by bok1234
This is a question about truncation of idiosyncratic values of data set.
Please see again my former data set 'variable_A.xls', an absurd one.
This data set has too many extreme values to test or graph with it,
so I need to truncate this data set with 'some standards' in order to rebuild it.

As the truncation standards, I think 2 ways as below.
(1) specific values as upper and lower bounds
e.g. truncating data more than 5000000 and less than -5000 among whole data set
(2) specific percentile as upper and lower bounds
e.g. truncating data more than upper 5% and lower 5% of whole data set distribution

Please let me know how to code these two ways.