Probit out-of-sample pseudo R2

Franziska · Unread post by **Franziska** » Tue Sep 04, 2012 12:18 pm

Hi all,

I've been estimating a Probit model to forecast the probability of a recession with financial variables, like in the paper by Estrella and Mishkin 1998(see file attached). For example, to estimate the 6-month ahead probability of a recession in a (pseudo) out-of-sample test using the term spread, I have the following code:

@NBERCycles(peaks=peaks,troughs=troughs,up=ups,down=downs)
do endperiod = 1994:12, 2008:6
DDV(DIST=PROBIT, noprint) downs 1960:1 endperiod
# constant spread{6}
prj(dist=probit,cdf=prb)
end do

How can I calculate the out-of-sample pseudo-R2 statistic used in the paper ?

Thank you very much in advance,

Franziska

TomDoan · Unread post by **TomDoan** » Tue Sep 04, 2012 1:04 pm

They really don't do a very good job of defining how that's calculated, particularly for the rolling sample. This is my best guess of what they mean. Note that your PRJ instructions are doing the calculation in-sample, not out-of-sample. You need to override the range in order to do the predictions (here <<NH>> steps ahead).

Code: Select all

@NBERCycles(peaks=peaks,troughs=troughs,up=ups,down=downs)
clear(zeros) logl loglc
compute nh=6
do endperiod = 1994:12, 2008:6
   ddv(dist=probit, noprint) downs 1960:1 endperiod
   # constant spread{6}
   prj(dist=probit,cdf=prb) * endperiod+nh endperiod+nh
   ddv(dist=probit, noprint) downs 1960:1 endperiod
   # constant
   prj(distrib=probit,cdf=prbc) * endperiod+nh endperiod+nh
   compute logl(endperiod)=log(%if(downs(endperiod+nh),prb(endperiod+nh),1-prb(endperiod+nh)))
   compute loglc(endperiod)=log(%if(downs(endperiod+nh),prbc(endperiod+nh),1-prbc(endperiod+nh)))
end do
sstats 1994:12 2008:6 logl>>sumlogl loglc>>sumloglc
disp "Pseudo-R^2" 1-(sumlogl/sumloglc)^((-2.0/%nobs)*sumloglc)

Franziska · Unread post by **Franziska** » Wed Sep 05, 2012 12:47 pm

Hi Tom,
thank you very much for your help, I have replicated the Estrella and Mishkin (1998) tests for the spread, both in and out-of-sample. My results for the in-sample tests are pretty close, the small difference may be due to rounding or a different data source, but the out-of-sample pseudo R2s are off. I'm a little bit confused about the use of lagged explanatory variables in the out-of-sample test, the paper does not mention any lags for the spread(just spread_t), but then I get negative R2s for all forecast horizons. I attached the data files, my code is

@NBERCycles(peaks=peaks,troughs=troughs,up=ups,down=downs)
set spread = GS10 - TBill

*IN-SAMPLE**************
dis "In-Sample Estimation"
do i=1,8
DDV(DIST=PROBIT, robusterrors, noprint) downs 1959:1 1995:1
# constant spread{i}
prj(dist=probit,cdf=cdf)
set %s("fitprb"+i) = cdf
dis i "quarters ahead pseudo-R^2:" %Rsquared "t-stat:" %TSTATS(2)
end do

*OUT-OF-SAMPLE**********
clear(zeros) logl loglc
dis "Out-of-sample Estimation"
do i=1,8
do endperiod = 1970:4, 1995:1
DDV(DIST=PROBIT, noprint) downs 1959:1 endperiod
# constant spread{1}
prj(dist=probit,cdf=prb) * endperiod+i endperiod+i
ddv(dist=probit, noprint) downs 1959:1 endperiod
# constant
prj(distrib=probit,cdf=prbc) * endperiod+i endperiod+i
compute logl(endperiod)=log(%if(downs(endperiod+i),prb(endperiod+i),1-prb(endperiod+i)))
compute loglc(endperiod)=log(%if(downs(endperiod+i),prbc(endperiod+i),1-prbc(endperiod+i)))
end do endperiod
sstats 1970:4 1995:1 logl>>sumlogl loglc>>sumloglc
disp i "quarters ahead pseudo-R^2:" 1-(sumlogl/sumloglc)^((-2.0/%nobs)*sumloglc)
end do i

Thanks again,
Franziska

TomDoan · Unread post by **TomDoan** » Wed Sep 05, 2012 1:00 pm

Could you attach your whole program please? You cut out the data instructions. Apply the Code button to your pasted program - it makes the post easier to read.

That's just a timing convention. They model y*(t+h) given x(t) rather than y*(t) given x(t-h).

Franziska · Unread post by **Franziska** » Wed Sep 05, 2012 4:06 pm

This is the program file

TomDoan · Unread post by **TomDoan** » Wed Sep 05, 2012 7:40 pm

You have a {1} rather than {i} in the line marked in red.

do i=1,8
do endperiod = 1970:4, 1995:1
DDV(DIST=PROBIT, noprint) downs 1959:1 endperiod
# constant spread{i}
prj(dist=probit,cdf=prb) * endperiod+i endperiod+i
ddv(dist=probit, noprint) downs 1959:1 endperiod
# constant
prj(distrib=probit,cdf=prbc) * endperiod+i endperiod+i
compute logl(endperiod)=log(%if(downs(endperiod+i),prb(endperiod+i),1-prb(endperiod+i)))
compute loglc(endperiod)=log(%if(downs(endperiod+i),prbc(endperiod+i),1-prbc(endperiod+i)))
end do endperiod
sstats 1970:4 1995:1 logl>>sumlogl loglc>>sumloglc
disp i "quarters ahead pseudo-R^2:" 1-(sumlogl/sumloglc)^((-2.0/%nobs)*sumloglc)
end do i

The RATS Software Forum

Probit out-of-sample pseudo R2

Probit out-of-sample pseudo R2

Re: Probit out-of-sample pseudo R2

Re: Probit out-of-sample pseudo R2

Re: Probit out-of-sample pseudo R2

Re: Probit out-of-sample pseudo R2

Re: Probit out-of-sample pseudo R2