J statistics with lots of dummies
J statistics with lots of dummies
Hi all,
I am now regressing daily data of bond yields on some 0-1 dummy variables using GMM.
Then I encountered the phenomenon that an increase in the number of dummy variables decrease
the significance level of J statistics. Surely, the overidentification restriction could be rejected more
easily if these dummies correlate with residuals, but I can not find any reason why simple 0-1 dummies
correlate strongly with residuals. (I treat these dummies as exogenous, so I include these dummies in
instrumetals.)
My question is if these phenomenon is general and common or not. I am now thinking this phenomenon
is inevitable mathematically, but I could not resolve this question by myself.
I am very happy if anyone give me any advice.
T_FIELD
I am now regressing daily data of bond yields on some 0-1 dummy variables using GMM.
Then I encountered the phenomenon that an increase in the number of dummy variables decrease
the significance level of J statistics. Surely, the overidentification restriction could be rejected more
easily if these dummies correlate with residuals, but I can not find any reason why simple 0-1 dummies
correlate strongly with residuals. (I treat these dummies as exogenous, so I include these dummies in
instrumetals.)
My question is if these phenomenon is general and common or not. I am now thinking this phenomenon
is inevitable mathematically, but I could not resolve this question by myself.
I am very happy if anyone give me any advice.
T_FIELD
Re: J statistics with lots of dummies
Are those single time period dummies? If they are, then the orthogonality condition for the dummy becomes eps(t_d)=0 where t_d is the time period for the dummy. In other words, it wants to force the regression towards zeroing out one residual. If you have many of them (particularly if you have some outliers), you won't be able to make that happen easily, hence the high J statistic.
Re: J statistics with lots of dummies
Thank you very much for your reply, Mr. Doan.
>Are those single time period dummies?
Yes, the dummy takes 1 for a day and zero for the other days.
>then the orthogonality condition for the dummy becomes eps(t_d)=0 where t_d is the time
period for the dummy. In other words, it wants to force the regression towards zeroing out one residual.
Actually, I could not understand what you mention above, especially "eps(t_d)=0" and "zeroing out one residual."
What is the mean of "eps" ? Does the latter mean "The dummy gives zero to residuals on the day when the
dummy takes 1" ?
Given your hints, I am now thinking that the factors with regards to the dummy in the HAC take very small value,
so the J statistics goes toward a big number. Is this correct?
Thanking you in advance for your trouble.
T_FIELD
>Are those single time period dummies?
Yes, the dummy takes 1 for a day and zero for the other days.
>then the orthogonality condition for the dummy becomes eps(t_d)=0 where t_d is the time
period for the dummy. In other words, it wants to force the regression towards zeroing out one residual.
Actually, I could not understand what you mention above, especially "eps(t_d)=0" and "zeroing out one residual."
What is the mean of "eps" ? Does the latter mean "The dummy gives zero to residuals on the day when the
dummy takes 1" ?
Given your hints, I am now thinking that the factors with regards to the dummy in the HAC take very small value,
so the J statistics goes toward a big number. Is this correct?
Thanking you in advance for your trouble.
T_FIELD
Re: J statistics with lots of dummies
After I wrote my reply above, I found other relevant questions.
Could you give me any advice on the following questions after running my attached program file?
1: Why do we get different J statistics between first two outputs?
-As you find, the difference between two regression codes is the order of regressor.
-As is shown in REG 2 in my program file, this difference can be neglectable.
-Which order should I use?
2: Why can't we get normal results from REG 3?
-In my sense, this list of regressors is not so strange.
3: Why do we get different J statistics between two outputs in REG 4?
-Though I found the outputs of these two regression are terrible, but why do we have different
result from REG 3?
Thanking you in advance for your trouble.
T_FIELD
Could you give me any advice on the following questions after running my attached program file?
1: Why do we get different J statistics between first two outputs?
-As you find, the difference between two regression codes is the order of regressor.
-As is shown in REG 2 in my program file, this difference can be neglectable.
-Which order should I use?
2: Why can't we get normal results from REG 3?
-In my sense, this list of regressors is not so strange.
3: Why do we get different J statistics between two outputs in REG 4?
-Though I found the outputs of these two regression are terrible, but why do we have different
result from REG 3?
Thanking you in advance for your trouble.
T_FIELD
- Attachments
-
- JSTATS.xls
- (302 KiB) Downloaded 771 times
-
- ON_JSTATS.PRG
- (1.54 KiB) Downloaded 962 times
Re: J statistics with lots of dummies
The weight matrix for GMM is a problem when you have dummies. If D is a time-period dummy, then Du is just a single residual, so DuuD is just a single residual squared. When the dummy is both in the regression equation and in the information set, the first stage 2SLS regression will zero out that residual so the Z'uu'Z matrix that needs to be inverted to get the moment weights will have have some zero diagonal elements. You'll either have to stick with 2SLS, or compute the weight matrix using actual expectations rather than sample averages.
Re: J statistics with lots of dummies
Thank you very much for your kind advice.
Thanks to you, I can reach same conclusion as yours.
I calculate simple example by hands, and found Z'uu'Z matrix had a zero diagonal elements on (k,k) in which k meant the time when the dummy had zero.
Since I do not have any idea to replace this 0 element, I am going to use IV with footnotes which explain the reason why I have to give up GMM.
Again, thank you very much for your kindness.
T_FIELD
Thanks to you, I can reach same conclusion as yours.
I calculate simple example by hands, and found Z'uu'Z matrix had a zero diagonal elements on (k,k) in which k meant the time when the dummy had zero.
Since I do not have any idea to replace this 0 element, I am going to use IV with footnotes which explain the reason why I have to give up GMM.
Again, thank you very much for your kindness.
T_FIELD