Page 1 of 1
Panel Data Operation
Posted: Mon Oct 28, 2013 4:39 pm
by yngvi
I have a panel of 11 individuals and 16 time periods, 1997-2012.
Problem 1:
Applying a division by a common time series variable for all individuals e.g. CPI.
The SET instruction doesn't seem to work as the common variable is interpreted as
belonging to individual 1.
I have come up with the following
Code: Select all
compute nobs=176
declare vector[real] YD_r(nobs)
do i=0,nobs-16,16
do j=1,16
compute k=i+j
compute YD_r(k) = YD(k)/CPI(j)*CPI(16)
end do j
end do i
* converting vector to panel data
set yd_rt 1//1997:1 11//2012:1 = yd_r(t-1//1997:1+1)
Question: Is there an easier/cleaner way to do this?
Problem 2: What is the most natural way of making up a panel of time differences?
It can be done within a double do loop structure similar as above, e.g.
Code: Select all
declare vector[real] dYD_r(nobs-11)
do i=0,nobs-26,15
do j=1,15
compute k=i+j
compute dYD_r(k) = log(YD_r(k+1)/YD_r(k))
end do j
end do i
However I have a problem converting the dYd_r vector back to time series relating as
Code: Select all
set dYD_rt 1//1998:1 11//2012:1 = dYD_r(t-1//1998:1+i)
doesn't work. It seems to assume the time series element of the differenced panel
begins in 1997. I'm pretty sure that I'm overlooking something really simple.
Thanks, Yngvi
Re: Panel Data Operation
Posted: Mon Oct 28, 2013 6:56 pm
by TomDoan
yngvi wrote:I have a panel of 11 individuals and 16 time periods, 1997-2012.
Problem 1:
Applying a division by a common time series variable for all individuals e.g. CPI.
The SET instruction doesn't seem to work as the common variable is interpreted as
belonging to individual 1.
I have come up with the following
Code: Select all
compute nobs=176
declare vector[real] YD_r(nobs)
do i=0,nobs-16,16
do j=1,16
compute k=i+j
compute YD_r(k) = YD(k)/CPI(j)*CPI(16)
end do j
end do i
* converting vector to panel data
set yd_rt 1//1997:1 11//2012:1 = yd_r(t-1//1997:1+1)
Question: Is there an easier/cleaner way to do this?
Sounds like you want
set(nopanel) yd_r = yd/cpi(%period(t))*cpi(2012:1)
yngvi wrote:
Problem 2: What is the most natural way of making up a panel of time differences?
It can be done within a double do loop structure similar as above, e.g.
Code: Select all
declare vector[real] dYD_r(nobs-11)
do i=0,nobs-26,15
do j=1,15
compute k=i+j
compute dYD_r(k) = log(YD_r(k+1)/YD_r(k))
end do j
end do i
However I have a problem converting the dYd_r vector back to time series relating as
Code: Select all
set dYD_rt 1//1998:1 11//2012:1 = dYD_r(t-1//1998:1+i)
doesn't work. It seems to assume the time series element of the differenced panel
begins in 1997. I'm pretty sure that I'm overlooking something really simple.
If you're trying to do a forward difference,
set dyd_r = log(yd_r{-1}/yd_r)
That will be defined except for the final period for each individual.
Re: Panel Data Operation
Posted: Tue Oct 29, 2013 6:16 am
by yngvi
Thanks.
The solution to
problem 1:
Code: Select all
set(nopanel) yd_r = yd/cpi(%period(t))*cpi(2012:1)
needed to be ammended as:
Code: Select all
set(nopanel) yd_r 1//1997:1 11//2012:1 = yd/cpi(%period(t))*cpi(2012:1)
i.e. I needed to specify the sample range explicitly. The %period(t) was what I was looking for. I had already tried something similar.
As regards
problem 2 then I wasn't looking for a forward difference I just wanted to populate the vector with no empty elements.
As in the case of problem 1 I needed to specify the sample range explicitly but within a do loop. I don't understand why I can't use a similar sample reference as I did for problem 1.
In any case the following works:
Code: Select all
do i=1,11
set dYD_r i//1998:1 i//2012:1 = log(yd_r/yd_r{1})
end do i
I would like to understand why I need to reference the sample range in a different manner in problem 2 than in problem 1 if you can provide a comment on that.
Re: Panel Data Operation
Posted: Tue Oct 29, 2013 6:24 am
by TomDoan
yngvi wrote:
As regards
problem 2 then I wasn't looking for a forward difference I just wanted to populate the vector with no empty elements.
As in the case of problem 1 I needed to specify the sample range explicitly but within a do loop. I don't understand why I can't use a similar sample reference as I did for problem 1.
In any case the following works:
Code: Select all
do i=1,11
set dYD_r i//1998:1 i//2012:1 = log(yd_r/yd_r{1})
end do i
I would like to understand why I need to reference the sample range in a different manner in problem 2 than in problem 1 if you can provide a comment on that.
How do you difference the data without losing data points?
If you're doing the standard backwards difference, you just need
set dYD_r = log(yd_r/yd_r{1})
The first element of each individual will be NA. With panel data, by default, an expression which spans two individuals (as this would at the first entry in an individual's record) will be NA. The NOPANEL option in the other case disables that, allowing the calculation to use data from the common series (CPI) which is in the first individual's record only.
Re: Panel Data Operation
Posted: Tue Oct 29, 2013 8:52 am
by yngvi
TomDoan wrote:yngvi wrote:
As regards
problem 2 then I wasn't looking for a forward difference I just wanted to populate the vector with no empty elements.
As in the case of problem 1 I needed to specify the sample range explicitly but within a do loop. I don't understand why I can't use a similar sample reference as I did for problem 1.
In any case the following works:
Code: Select all
do i=1,11
set dYD_r i//1998:1 i//2012:1 = log(yd_r/yd_r{1})
end do i
I would like to understand why I need to reference the sample range in a different manner in problem 2 than in problem 1 if you can provide a comment on that.
How do you difference the data without losing data points?
If you're doing the standard backwards difference, you just need
set dYD_r = log(yd_r/yd_r{1})
The first element of each individual will be NA. With panel data, by default, an expression which spans two individuals (as this would at the first entry in an individual's record) will be NA. The NOPANEL option in the other case disables that, allowing the calculation to use data from the common series (CPI) which is in the first individual's record only.
1) I was not addressing the fact that you loose datapoints. I was addressing the issue that in my example the SET instruction doesn't run correctly without setting the sample explicitly on the instruction.
2) In comparing problems 1 and 2 I was referring to that I seem to need a do loop in the second problem while I can use the SET instruction for the full sample, including the individual references. I.e. 1//1998:1 11/2012:1 vs. i//1998:1 i//2012:1 within a do loop. I am puzzled by the need to set the sample in different manners for the two problems. I am not puzzled at all by loosing a datapoint - that goes without saying.

Re: Panel Data Operation
Posted: Tue Oct 29, 2013 9:28 am
by TomDoan
What's your ALLOCATE instruction or first DATA range? That sets the default length on a SET instruction. If you make the ALLOCATE instruction cover the full range of the panel data, then you won't have to do an explicit range later on.
Re: Panel Data Operation
Posted: Tue Oct 29, 2013 12:20 pm
by yngvi
This is how I set it up. I never did a separate ALLOCATE for the panel.
Code: Select all
cal(a) 1997:1
all 2012:1
*
open data Indices.xlsx
data(org=obs,format=xlsx) 1997:01 2012:01 CPI W
close data
*
cal(a,panelobs=16) 1997:1
open data Panel.rat
data(format=rats) 1//1997:1 11//2012:1 C YD TFER A
close data
Then I set the ranges on the individual SET instructions.
Thanks for the help.
Re: Panel Data Operation
Posted: Tue Oct 29, 2013 12:43 pm
by yngvi
FYI then I added an ALLOCATE 11/2012:1 instruction.
I still seem to require to put a do loop around the differencing set instructions as per my earlier post.
It's a puzzle but I've got a working program.
BTW do you have any documentation on the MOVE instruction.
Many thanks again.
Re: Panel Data Operation
Posted: Tue Oct 29, 2013 1:46 pm
by TomDoan
If you do the ALLOCATE for the panel range, you shouldn't have to loop over the individuals.
The syntax for MOVE is
move series start end newseries newstart
It doesn't look at panel boundaries, so just shifts data around between series and newseries.