Page 1 of 2

bootstrap for cox proportional hazards model

Posted: Sat Mar 18, 2017 12:45 am
by fan
Dear Tom,

I am trying to replicate Micheal W. Klein (1996) paper, Timing is all: Elections and the Duration of United Stated Business Cycles, published in Journal of Money, Credit and Banking, Vol.28, No.1 (Feb., 1996), pp.84-101. In the paper, the author mentioned that
The relatively small number of business cycles, especially in the subsamples,makes inference based upon standard asymptotics suspect. Therefore we use boot-strap techniques. The point estimates presented in Tables 4 through 8 represent the mean value of the respective estimates for five hundred resamples from the originaldata. We resample "clusters" of observations from the original data set choosing theset of observations corresponding to an entire business cycle as one "draw." Each resample therefore has the same number of business cycles as the original sample.
I am wondering how I can do such boot-strap in Rats. Following is my code for the original cox model

Code: Select all

compute nobs=9
dec vect[vect] riskset(nobs)
dec vect hazards(nobs)
do i=1,nobs
   dim riskset(i)(nobs)
   ewise riskset(i)(j)=duration(j)>=duration(i)
end do i
*
* The function HazardCalc gets computed at the start of each function evaluation to
* recalculate the relative hazard rates. These are put into the vector HAZARDS. The
* probability that an individual is the one to exit at her exit time is the ratio of her
* relative hazard to the sum of those hazards across the risk set for her exit time.
*
nonlin b1 
compute b1=0.0
frml hazard = exp(b1*twentyfourmonth)
*
function HazardCalc
do i=1,nobs
   compute hazards(i)=hazard(i)
end do i
end
*
frml logl   = log(hazards(t))-log(%dot(riskset(t),hazards))
maximize(method=bfgs,start=HazardCalc()) logl
The paper and my sample data set are also attached. I am looking forward to hearing from you. Thank you in advance .

Re: bootstrap for cox proportional hazards model

Posted: Sat Mar 18, 2017 8:43 am
by TomDoan
So far as I can tell, despite the odd description (the "cluster" reference makes it sound like a block bootstrap), they're just bootstrapping the observations. That would shuffle the Y's and X's together (drawing with replacement).

Re: bootstrap for cox proportional hazards model

Posted: Sat Mar 18, 2017 1:30 pm
by fan
TomDoan wrote:So far as I can tell, despite the odd description (the "cluster" reference makes it sound like a block bootstrap), they're just bootstrapping the observations. That would shuffle the Y's and X's together (drawing with replacement).

Thank you for the quick reply. Is there any example for block bootstrapping in Rats code? Or could you please kindly help me start writing code for block bootstrapping? Thank you

Re: bootstrap for cox proportional hazards model

Posted: Sat Mar 18, 2017 3:30 pm
by TomDoan
It's not block bootstrapping.It's a 20+ year old paper from a time when bootstrapping wasn't done much and the description isn't very precise.

Most of the (many) examples of bootstrapping do parametric bootstrapping where you shuffle residuals and rebuild the data from that. What you would want to do here is to do the basic independence bootstrap where you just shuffle complete observations. BOOTFGLS.RPF and the Greene example GRNP205.RPF do that.

I'm not really sure how bootstrapping would work with this data set, however. The explanatory variables are all dummies and it's quite likely that random selection of entries with replacement will produce samples where a dummy is all zeros, which means that you couldn't estimate the model.

Re: bootstrap for cox proportional hazards model

Posted: Fri Mar 24, 2017 6:40 pm
by fan
TomDoan wrote:It's not block bootstrapping.It's a 20+ year old paper from a time when bootstrapping wasn't done much and the description isn't very precise.

Most of the (many) examples of bootstrapping do parametric bootstrapping where you shuffle residuals and rebuild the data from that. What you would want to do here is to do the basic independence bootstrap where you just shuffle complete observations. BOOTFGLS.RPF and the Greene example GRNP205.RPF do that.

I'm not really sure how bootstrapping would work with this data set, however. The explanatory variables are all dummies and it's quite likely that random selection of entries with replacement will produce samples where a dummy is all zeros, which means that you couldn't estimate the model.
Dear Tom, thank you so much for the detailed explanation here. I think I understood what I need to do but I am having a problem in running the codes. Here are my codes

Code: Select all

DATA(FORMAT=XLSX,ORG=COLUMNS,SHEET="Sheet2") 1 9 Id durationC durationE $
warC warE nineC nineE twleveC twleveE twentyfourC twentyfourE

compute [vect] bboot=%zeros(1,1)
compute [symm] xxboot=%zeros(1,1)

compute nboot=500
do n=1, nboot
boot entries
set durationcb = durationc(entries(t))
set durationeb = duratione(entries(t))
set warcb = warc(entries(t))
set wareb = ware(entries(t))
set ninecb = ninec(entries(t))
set nineeb = ninee(entries(t))
set twlevecb = twlevec(entries(t))
set twleveeb = twlevee(entries(t))
set twentyfourcb = twentyfourc(entries(t))
set twentyfoureb = twentyfoure(entries(t))

* Proportional hazard
*
* RISKSET is a vector of vectors. Each individual has a vector and each vector has a slot
* for each individual. For individual i, slot j will be 1 if j’s exit time is greater than
* or equal to i’s.
*
compute nobs=9
dec vect[vect] riskset(nobs)
dec vect hazards(nobs)
do i=1,nobs
   dim riskset(i)(nobs)
   ewise riskset(i)(j)=durationeb(j)>=durationeb(i)
end do i
*
* The function HazardCalc gets computed at the start of each function evaluation to
* recalculate the relative hazard rates. These are put into the vector HAZARDS. The
* probability that an individual is the one to exit at her exit time is the ratio of her
* relative hazard to the sum of those hazards across the risk set for her exit time.
*
nonlin b1
compute b1=0.0
frml hazard = exp(b1*nineeb)
*
function HazardCalc
do i=1,nobs
   compute hazards(i)=hazard(i)
end do i
end
*
frml logl   = log(hazards(t))-log(%dot(riskset(t),hazards))
maximize(method=bfgs,start=HazardCalc()) logl

compute bboot=bboot+%beta
compute xxboot=xxboot+%outerxx(%beta)

end do n

compute bboot=bboot/nboot
compute xxboot=xxboot/nboot
compute stderrboot = %sqrt(%xdiag(xxboot))


The system says "## CP17. PROCEDURE/FUNCTION Must be Initial Statement in a Compiled Section
>>>>function <<<<" Could you please kindly help me to fix this error?

In addition, I would like to check with you that bboot and stderrboot will give me the mean value of the estimates for the 500 re-samples and the bootstrapped standard errors, right? Many Thanks

Re: bootstrap for cox proportional hazards model

Posted: Fri Mar 24, 2017 7:07 pm
by TomDoan
You're defining all the calculations, including the function, inside the bootstrapping loop. You need to define those first.

Again, however, if your data is anything like theirs, I think you're wasting your time. I don't know what they did to get any results, but you can't do bootstrapping on the data set/model that they describe.

Re: bootstrap for cox proportional hazards model

Posted: Sat Mar 25, 2017 12:30 am
by fan
TomDoan wrote:You're defining all the calculations, including the function, inside the bootstrapping loop. You need to define those first.

Again, however, if your data is anything like theirs, I think you're wasting your time. I don't know what they did to get any results, but you can't do bootstrapping on the data set/model that they describe.
Thank you for the quick reply. I have emailed the author with my sample data to see whether I have the same data used by him in the paper. Meanwhile, I tried to correct the error by moving the definition of the FUNCTION hazardcalc up to before the the bootstrapping loop

Code: Select all


DATA(FORMAT=XLSX,ORG=COLUMNS,SHEET="Sheet2") 1 9 Id durationC durationE $
warC warE nineC nineE twleveC twleveE twentyfourC twentyfourE

compute [vect] bboot=%zeros(1,1)
compute [symm] xxboot=%zeros(1,1)

compute nboot=500

function HazardCalc

do n=1, nboot
boot entries
set durationcb = durationc(entries(t))
set durationeb = duratione(entries(t))
set warcb = warc(entries(t))
set wareb = ware(entries(t))
set ninecb = ninec(entries(t))
set nineeb = ninee(entries(t))
set twlevecb = twlevec(entries(t))
set twleveeb = twlevee(entries(t))
set twentyfourcb = twentyfourc(entries(t))
set twentyfoureb = twentyfoure(entries(t))

* Proportional hazard
*
* RISKSET is a vector of vectors. Each individual has a vector and each vector has a slot
* for each individual. For individual i, slot j will be 1 if j’s exit time is greater than
* or equal to i’s.
*
compute nobs=9
dec vect[vect] riskset(nobs)
dec vect hazards(nobs)
do i=1,nobs
   dim riskset(i)(nobs)
   ewise riskset(i)(j)=durationeb(j)>=durationeb(i)
end do i
*
* The function HazardCalc gets computed at the start of each function evaluation to
* recalculate the relative hazard rates. These are put into the vector HAZARDS. The
* probability that an individual is the one to exit at her exit time is the ratio of her
* relative hazard to the sum of those hazards across the risk set for her exit time.
*
nonlin b1
compute b1=0.0
frml hazard = exp(b1*nineeb)
*
* function HazardCalc
do i=1,nobs
   compute hazards(i)=hazard(i)
end do i
end
*
frml logl   = log(hazards(t))-log(%dot(riskset(t),hazards))
maximize(method=bfgs,start=HazardCalc()) logl

compute bboot=bboot+%beta
compute xxboot=xxboot+%outerxx(%beta)

end do n

compute bboot=bboot/nboot
compute xxboot=xxboot/nboot
compute stderrboot = %sqrt(%xdiag(xxboot))
The new error message says "## SX17. Missing Operator or Adjacent Operands
>>>>s,start=HazardCalc(<<<<" Could you please kindly help me understand what I did wrong in the codes?

Many Thanks

Re: bootstrap for cox proportional hazards model

Posted: Sat Mar 25, 2017 6:34 am
by TomDoan
You have to move the entire HazardCalc function definition,not just the first line.

Re: bootstrap for cox proportional hazards model

Posted: Sat Mar 25, 2017 11:05 pm
by fan
TomDoan wrote:You have to move the entire HazardCalc function definition,not just the first line.
Hi Tom, thanks for the quick reply. I am not clear how to move the entire HazardCal function definition before the bootstrap looping. Inside the HazardCal function, com hazards(i) = hazard(i). However, frml hazard = epx(b1xNineeb), which already uses the bootstrapped covariate (nineeb) to calculate the hazard. Therefore, I do not know how to move the entire HazardCal function definition before the bootstrap looping when bootstrapped variable is used. Could you please kindly show me the necessary steps to implement bootstrapping here? Regards

Re: bootstrap for cox proportional hazards model

Posted: Sun Mar 26, 2017 8:57 am
by TomDoan
You can first DECLARE a series that's needed within a FUNCTION, but not defined until later since the FUNCTION code doesn't actually use it until the FUNCTION gets invoked.

Re: bootstrap for cox proportional hazards model

Posted: Sun Mar 26, 2017 6:40 pm
by fan
TomDoan wrote:You can first DECLARE a series that's needed within a FUNCTION, but not defined until later since the FUNCTION code doesn't actually use it until the FUNCTION gets invoked.
thank you so much for the quick reply. I tried to follow your advice by declaring hazard which is needed within the Function HazardCalc first. However, the error message says "## SX22. Expected Type FRML[ANY], Got VECTOR[REAL] Instead, >>>>frml hazard <<<< " . I initially thought it is because hazard is defined to be a formula rml hazard = exp(b1*nineeb), thus I declared the hazard to be a vector of formula, dec vect[frml] harazrd. However, I still have the error message. I am not sure what is cause for the error. Could you please kindly show me how to fix problem in my code? Thank you in advance

Code: Select all

compute nobs=9
dec vect[vect] riskset(nobs)
dec vect hazards(nobs)
dec vect hazard(nobs)

function HazardCalc
do i=1,nobs
   compute hazards(i)=hazard(i)
end do i
end

compute [vect] bboot=%zeros(1,1)
compute [symm] xxboot=%zeros(1,1)

compute nboot=500
do n=1, nboot
boot entries
set durationcb = durationc(entries(t))
set durationeb = duratione(entries(t))
set warcb = warc(entries(t))
set wareb = ware(entries(t))
set ninecb = ninec(entries(t))
set nineeb = ninee(entries(t))
set twlevecb = twlevec(entries(t))
set twleveeb = twlevee(entries(t))
set twentyfourcb = twentyfourc(entries(t))
set twentyfoureb = twentyfoure(entries(t))


* Proportional hazard
*
* RISKSET is a vector of vectors. Each individual has a vector and each vector has a slot
* for each individual. For individual i, slot j will be 1 if j’s exit time is greater than
* or equal to i’s.
*
*compute nobs=9
*dec vect[vect] riskset(nobs)
*dec vect hazards(nobs)
do i=1,nobs
   dim riskset(i)(nobs)
   ewise riskset(i)(j)=durationeb(j)>=durationeb(i)
end do i

*
* The function HazardCalc gets computed at the start of each function evaluation to
* recalculate the relative hazard rates. These are put into the vector HAZARDS. The
* probability that an individual is the one to exit at her exit time is the ratio of her
* relative hazard to the sum of those hazards across the risk set for her exit time.
*
nonlin b1
compute b1=0.0
frml hazard = exp(b1*nineeb)
*
/* function HazardCalc
do i=1,nobs
   compute hazards(i)=hazard(i)
end do i
end */
*
frml logl   = log(hazards(t))-log(%dot(riskset(t),hazards))
maximize(method=bfgs,start=HazardCalc()) logl

compute bboot=bboot+%beta
compute xxboot=xxboot+%outerxx(%beta)

end do n

compute bboot=bboot/nboot
compute xxboot=xxboot/nboot-%outerxx(bboot)



Re: bootstrap for cox proportional hazards model

Posted: Sun Mar 26, 2017 9:58 pm
by TomDoan
HAZARD is a FRML, not a VECTOR. (HAZARDS is a VECTOR). Pull this outside the loop:

dec series nineeb
*
nonlin b1
compute b1=0.0
frml hazard = exp(b1*nineeb)
*
function HazardCalc
ewise hazards(i)=hazard(i)
end

It's the first line that I was talking about---nineeb is defined as part of the bootstrap loop. The generation of the bootstrapped data, the redefinition of the riskset, the MAXIMIZE instruction and the bookkeeping are the only things that belong inside the loop.

Re: bootstrap for cox proportional hazards model

Posted: Mon Mar 27, 2017 2:18 am
by fan
TomDoan wrote:HAZARD is a FRML, not a VECTOR. (HAZARDS is a VECTOR). Pull this outside the loop:

dec series nineeb
*
nonlin b1
compute b1=0.0
frml hazard = exp(b1*nineeb)
*
function HazardCalc
ewise hazards(i)=hazard(i)
end

It's the first line that I was talking about---nineeb is defined as part of the bootstrap loop. The generation of the bootstrapped data, the redefinition of the riskset, the MAXIMIZE instruction and the bookkeeping are the only things that belong inside the loop.
Thank you so much for your great patience. I have a question regarding independence bootstrap when dummy variables are included in regression.As you point out that
It's quite likely that random selection of entries with replacement will produce samples where a dummy is all zeros
. Is there any better way we can use to avoid the problem?

Re: bootstrap for cox proportional hazards model

Posted: Mon Mar 27, 2017 9:53 am
by TomDoan
I have no idea. There's nothing special about it being proportional hazards---it's basically any form of model where you're bootstrapping with one of the explanatory variables being a dummy with a relatively small observation count. There might be variations which control subsamples.

Re: bootstrap for cox proportional hazards model

Posted: Tue Mar 28, 2017 11:36 pm
by fan
TomDoan wrote:I have no idea. There's nothing special about it being proportional hazards---it's basically any form of model where you're bootstrapping with one of the explanatory variables being a dummy with a relatively small observation count. There might be variations which control subsamples.
Dear Tom, Thanks for the reply. As you pointed out that shuffle the observations will likely to generate samples with a dummy is all zeros, I am wondering whether I can do parametric bootstrapping by shuffling residuals and rebuild the data from the data from that. After all, I just want to get better statistically inferences on the estimates. Here are the codes I tried for parametric bootstrapping. I am not sure whether it is correct. Could you please take a quick look?

Code: Select all

compute nobs=9
dec vect[vect] riskset(nobs)
dec vect hazards(nobs)
do i=1,nobs
   dim riskset(i)(nobs)
   ewise riskset(i)(j)=duratione(j)>=duratione(i)
end do i
*
* The function HazardCalc gets computed at the start of each function evaluation to
* recalculate the relative hazard rates. These are put into the vector HAZARDS. The
* probability that an individual is the one to exit at her exit time is the ratio of her
* relative hazard to the sum of those hazards across the risk set for her exit time.
*
nonlin b1
compute b1=0.0
frml hazard = exp(b1*ninee)
*
function HazardCalc
do i=1,nobs
   compute hazards(i)=hazard(i)
end do i
end
*
frml logl   = log(hazards(t))-log(%dot(riskset(t),hazards))
maximize(method=bfgs,start=HazardCalc()) logl  / e

compute b1_hat=%beta(1)
set b1_boot 1 500 = 0.0
do n=1, 500
set e_boot = e(fix(%uniform(1,10)))
set hazard_boot = exp(b1_hat*ninee)+e_boot
compute hazards(i)=hazard_boot(i)
frml logl   = log(hazards(t))-log(%dot(riskset(t),hazards))
maximize(method=bfgs,start=HazardCalc()) logl 
compute b1_boot(n)=%beta(1)
end do n
print / b1_boot
stats b1_boot

Thank you in advance