fitting Beta Distribution to data

Carlo · Unread post by **Carlo** » Wed Dec 14, 2011 10:58 am

Dear all,

I’m trying to fit a unimodal generalized Beta distribution to some data as in:

Engelberg J., Manski C.F. and Williams J. (2009) Comparing the Point Predictions and Subjective Probability Distributions of Professional Forecasters, Journal of Business & Economic Statistics, vol. 27, pages 30-41.

So, suppose you have 10 bins from an individual distribution representing the probability one attach to gdp as follows:

bins: 1 ; 2 ; 3; 4 ; 5 ; 6; 7; 8 ; 9 ; 10
gdp: 6+ ; (5to5.9) ; (4to4.9); (3to3.9) ; (2to2.9) ; (1to1.9) ; (0to0.9) ; (-1to-0.1) ; (-2to-1.1) ; <-2
Prob: 0 ; 0 ; 0 ; 0.02 ; 0.05 ; 0.33 ; 0.5 ; 0.1 ; 0 ; 0

The idea is to fit a beta distribution to the data and then retrieve some quantiles.
I was wondering how can I fix the support of the distribution (‘l’ and ‘r’ in the above paper at pag. 38) and then solve equation (1) and (2) of the paper.
(the function %logbetadensity cannot be used for this)
Thanks a lot,
carlo

TomDoan · Unread post by **TomDoan** » Wed Dec 14, 2011 3:14 pm

The CDF for the beta is %betainc. That needs to be re-mapped into the proper interval. They're estimating a and b using a least squares fit for the empirical CDF vs the CDF of the generalized beta. The following is an example:

Code: Select all

compute nbin=10
dec vect lowbound(nbin) upbound(nbin)
input lowbound
   .  -2.0 -1.0 0.0 1.0 2.0 3.0 4.0 5.0 6.0
input upbound
 -2.0 -1.0  0.0 1.0 2.0 3.0 4.0 5.0 6.0  .
dec vect prob(nbin)
input prob
 .00 .01 .10 .50 .33 .05 .01 0 0 0
dec vect cprob(nbin)
compute total=0.0
do i=1,nbin
   compute total=total+prob(i)
   compute cprob(i)=total
end do i
*
sstats(min,smpl=prob(t)>0.0) 1 nbin lowbound(t)>>left
sstats(max,smpl=prob(t)>0.0) 1 nbin upbound(t)>>right
*
nonlin a b a>=1.0 b>=1.0
compute a=b=2.0
*
frml gap = remap=(upbound(t)-left)/(right-left),$
  %if(remap<0.0.or.remap>1.0,%na,cprob(t)-%betainc(remap,a,b))
nlls(frml=gap) *

Carlo · Unread post by **Carlo** » Tue Jan 10, 2012 6:16 am

Thanks Tom,
It works!
But then, when I have to recover the median and the interquartile range of the estimated distribution, I have to loop over thousands of values like:

dec vec A1(1000000) B1(1000000) B22(1000000)
do i=1,1000000
com A1(i) = %betainc((i*0.00001-left)/(right-left),%beta(1),%beta(2))
com B1(i) =i*0.00001
if A1(i)>0.500.and.A1(i)<0.501 ; com P_50 = b1(i)
if A1(i)>0.250.and.A1(i)<0.2501 ; com P_25 = b1(i)
if A1(i)>0.750.and.A1(i)<0.7501 ; com P_75 = b1(i)
end do

Or there is a fastest and more efficient way to do it in Rats?
(Of course, this methods is also not working when there is a positive probability attached to negative values of lowbound(nbin) upbound(nbin) in your example.
Like:
input prob
0 ; 20 ; 50 ; 20 ; 10 ; 0 ; 0 ; 0 ; 0 ; 0 )
Thanks a lot!
Carlo

TomDoan · Unread post by **TomDoan** » Tue Jan 10, 2012 11:55 am

There are two ways to do that in this case. One, which works for general functions, is to use FIND ROOT. If you have the lower and upper bounds and the two shape parameters (you'll have to save the latter out of %beta since those will get overwritten), you can invert the remapped %betainc with:

Code: Select all

nonlin median
compute lower=-6.0,upper=5.0,a=2.0,b=3.0
find root %betainc((median-lower)/(upper-lower),a,b)-.5
end find

For the interquartile range, repeat this with .5 in the FIND replaced with .25 and .75.

In this case, you can also use the fact that the F distribution is itself a remapping of the incomplete beta, so you can use %INVFTEST to do the inversion.

Code: Select all

compute fmedian=%invftest(.5,2.0*b,2.0*a)
compute betamedian=a/(a+b*fmedian) 
disp lower+betamedian*(upper-lower)

Again, replace the .5 in the %INVFTEST with .25 and .75 to get the other two quantiles.

The RATS Software Forum

fitting Beta Distribution to data

fitting Beta Distribution to data

Re: fitting Beta Distribution to data

Re: fitting Beta Distribution to data

Re: fitting Beta Distribution to data