fitting Beta Distribution to data

Use this forum to post questions about syntax problems or general programming issues. Questions on implementing a particular aspect of econometrics should go in "Econometrics Issues" below.
Carlo
Posts: 4
Joined: Wed Dec 14, 2011 10:03 am

fitting Beta Distribution to data

Unread post by Carlo »

Dear all,

I’m trying to fit a unimodal generalized Beta distribution to some data as in:

Engelberg J., Manski C.F. and Williams J. (2009) Comparing the Point Predictions and Subjective Probability Distributions of Professional Forecasters, Journal of Business & Economic Statistics, vol. 27, pages 30-41.

So, suppose you have 10 bins from an individual distribution representing the probability one attach to gdp as follows:

bins: 1 ; 2 ; 3; 4 ; 5 ; 6; 7; 8 ; 9 ; 10
gdp: 6+ ; (5to5.9) ; (4to4.9); (3to3.9) ; (2to2.9) ; (1to1.9) ; (0to0.9) ; (-1to-0.1) ; (-2to-1.1) ; <-2
Prob: 0 ; 0 ; 0 ; 0.02 ; 0.05 ; 0.33 ; 0.5 ; 0.1 ; 0 ; 0

The idea is to fit a beta distribution to the data and then retrieve some quantiles.
I was wondering how can I fix the support of the distribution (‘l’ and ‘r’ in the above paper at pag. 38) and then solve equation (1) and (2) of the paper.
(the function %logbetadensity cannot be used for this)
Thanks a lot,
carlo
TomDoan
Posts: 7814
Joined: Wed Nov 01, 2006 4:36 pm

Re: fitting Beta Distribution to data

Unread post by TomDoan »

The CDF for the beta is %betainc. That needs to be re-mapped into the proper interval. They're estimating a and b using a least squares fit for the empirical CDF vs the CDF of the generalized beta. The following is an example:

Code: Select all

compute nbin=10
dec vect lowbound(nbin) upbound(nbin)
input lowbound
   .  -2.0 -1.0 0.0 1.0 2.0 3.0 4.0 5.0 6.0
input upbound
 -2.0 -1.0  0.0 1.0 2.0 3.0 4.0 5.0 6.0  .
dec vect prob(nbin)
input prob
 .00 .01 .10 .50 .33 .05 .01 0 0 0
dec vect cprob(nbin)
compute total=0.0
do i=1,nbin
   compute total=total+prob(i)
   compute cprob(i)=total
end do i
*
sstats(min,smpl=prob(t)>0.0) 1 nbin lowbound(t)>>left
sstats(max,smpl=prob(t)>0.0) 1 nbin upbound(t)>>right
*
nonlin a b a>=1.0 b>=1.0
compute a=b=2.0
*
frml gap = remap=(upbound(t)-left)/(right-left),$
  %if(remap<0.0.or.remap>1.0,%na,cprob(t)-%betainc(remap,a,b))
nlls(frml=gap) *
Carlo
Posts: 4
Joined: Wed Dec 14, 2011 10:03 am

Re: fitting Beta Distribution to data

Unread post by Carlo »

Thanks Tom,
It works!
But then, when I have to recover the median and the interquartile range of the estimated distribution, I have to loop over thousands of values like:

dec vec A1(1000000) B1(1000000) B22(1000000)
do i=1,1000000
com A1(i) = %betainc((i*0.00001-left)/(right-left),%beta(1),%beta(2))
com B1(i) =i*0.00001
if A1(i)>0.500.and.A1(i)<0.501 ; com P_50 = b1(i)
if A1(i)>0.250.and.A1(i)<0.2501 ; com P_25 = b1(i)
if A1(i)>0.750.and.A1(i)<0.7501 ; com P_75 = b1(i)
end do

Or there is a fastest and more efficient way to do it in Rats?
(Of course, this methods is also not working when there is a positive probability attached to negative values of lowbound(nbin) upbound(nbin) in your example.
Like:
input prob
0 ; 20 ; 50 ; 20 ; 10 ; 0 ; 0 ; 0 ; 0 ; 0 )
Thanks a lot!
Carlo
TomDoan
Posts: 7814
Joined: Wed Nov 01, 2006 4:36 pm

Re: fitting Beta Distribution to data

Unread post by TomDoan »

There are two ways to do that in this case. One, which works for general functions, is to use FIND ROOT. If you have the lower and upper bounds and the two shape parameters (you'll have to save the latter out of %beta since those will get overwritten), you can invert the remapped %betainc with:

Code: Select all

nonlin median
compute lower=-6.0,upper=5.0,a=2.0,b=3.0
find root %betainc((median-lower)/(upper-lower),a,b)-.5
end find
For the interquartile range, repeat this with .5 in the FIND replaced with .25 and .75.

In this case, you can also use the fact that the F distribution is itself a remapping of the incomplete beta, so you can use %INVFTEST to do the inversion.

Code: Select all

compute fmedian=%invftest(.5,2.0*b,2.0*a)
compute betamedian=a/(a+b*fmedian) 
disp lower+betamedian*(upper-lower)
Again, replace the .5 in the %INVFTEST with .25 and .75 to get the other two quantiles.
Post Reply