fitting Beta Distribution to data
fitting Beta Distribution to data
Dear all,
I’m trying to fit a unimodal generalized Beta distribution to some data as in:
Engelberg J., Manski C.F. and Williams J. (2009) Comparing the Point Predictions and Subjective Probability Distributions of Professional Forecasters, Journal of Business & Economic Statistics, vol. 27, pages 30-41.
So, suppose you have 10 bins from an individual distribution representing the probability one attach to gdp as follows:
bins: 1 ; 2 ; 3; 4 ; 5 ; 6; 7; 8 ; 9 ; 10
gdp: 6+ ; (5to5.9) ; (4to4.9); (3to3.9) ; (2to2.9) ; (1to1.9) ; (0to0.9) ; (-1to-0.1) ; (-2to-1.1) ; <-2
Prob: 0 ; 0 ; 0 ; 0.02 ; 0.05 ; 0.33 ; 0.5 ; 0.1 ; 0 ; 0
The idea is to fit a beta distribution to the data and then retrieve some quantiles.
I was wondering how can I fix the support of the distribution (‘l’ and ‘r’ in the above paper at pag. 38) and then solve equation (1) and (2) of the paper.
(the function %logbetadensity cannot be used for this)
Thanks a lot,
carlo
I’m trying to fit a unimodal generalized Beta distribution to some data as in:
Engelberg J., Manski C.F. and Williams J. (2009) Comparing the Point Predictions and Subjective Probability Distributions of Professional Forecasters, Journal of Business & Economic Statistics, vol. 27, pages 30-41.
So, suppose you have 10 bins from an individual distribution representing the probability one attach to gdp as follows:
bins: 1 ; 2 ; 3; 4 ; 5 ; 6; 7; 8 ; 9 ; 10
gdp: 6+ ; (5to5.9) ; (4to4.9); (3to3.9) ; (2to2.9) ; (1to1.9) ; (0to0.9) ; (-1to-0.1) ; (-2to-1.1) ; <-2
Prob: 0 ; 0 ; 0 ; 0.02 ; 0.05 ; 0.33 ; 0.5 ; 0.1 ; 0 ; 0
The idea is to fit a beta distribution to the data and then retrieve some quantiles.
I was wondering how can I fix the support of the distribution (‘l’ and ‘r’ in the above paper at pag. 38) and then solve equation (1) and (2) of the paper.
(the function %logbetadensity cannot be used for this)
Thanks a lot,
carlo
Re: fitting Beta Distribution to data
The CDF for the beta is %betainc. That needs to be re-mapped into the proper interval. They're estimating a and b using a least squares fit for the empirical CDF vs the CDF of the generalized beta. The following is an example:
Code: Select all
compute nbin=10
dec vect lowbound(nbin) upbound(nbin)
input lowbound
. -2.0 -1.0 0.0 1.0 2.0 3.0 4.0 5.0 6.0
input upbound
-2.0 -1.0 0.0 1.0 2.0 3.0 4.0 5.0 6.0 .
dec vect prob(nbin)
input prob
.00 .01 .10 .50 .33 .05 .01 0 0 0
dec vect cprob(nbin)
compute total=0.0
do i=1,nbin
compute total=total+prob(i)
compute cprob(i)=total
end do i
*
sstats(min,smpl=prob(t)>0.0) 1 nbin lowbound(t)>>left
sstats(max,smpl=prob(t)>0.0) 1 nbin upbound(t)>>right
*
nonlin a b a>=1.0 b>=1.0
compute a=b=2.0
*
frml gap = remap=(upbound(t)-left)/(right-left),$
%if(remap<0.0.or.remap>1.0,%na,cprob(t)-%betainc(remap,a,b))
nlls(frml=gap) *Re: fitting Beta Distribution to data
Thanks Tom,
It works!
But then, when I have to recover the median and the interquartile range of the estimated distribution, I have to loop over thousands of values like:
dec vec A1(1000000) B1(1000000) B22(1000000)
do i=1,1000000
com A1(i) = %betainc((i*0.00001-left)/(right-left),%beta(1),%beta(2))
com B1(i) =i*0.00001
if A1(i)>0.500.and.A1(i)<0.501 ; com P_50 = b1(i)
if A1(i)>0.250.and.A1(i)<0.2501 ; com P_25 = b1(i)
if A1(i)>0.750.and.A1(i)<0.7501 ; com P_75 = b1(i)
end do
Or there is a fastest and more efficient way to do it in Rats?
(Of course, this methods is also not working when there is a positive probability attached to negative values of lowbound(nbin) upbound(nbin) in your example.
Like:
input prob
0 ; 20 ; 50 ; 20 ; 10 ; 0 ; 0 ; 0 ; 0 ; 0 )
Thanks a lot!
Carlo
It works!
But then, when I have to recover the median and the interquartile range of the estimated distribution, I have to loop over thousands of values like:
dec vec A1(1000000) B1(1000000) B22(1000000)
do i=1,1000000
com A1(i) = %betainc((i*0.00001-left)/(right-left),%beta(1),%beta(2))
com B1(i) =i*0.00001
if A1(i)>0.500.and.A1(i)<0.501 ; com P_50 = b1(i)
if A1(i)>0.250.and.A1(i)<0.2501 ; com P_25 = b1(i)
if A1(i)>0.750.and.A1(i)<0.7501 ; com P_75 = b1(i)
end do
Or there is a fastest and more efficient way to do it in Rats?
(Of course, this methods is also not working when there is a positive probability attached to negative values of lowbound(nbin) upbound(nbin) in your example.
Like:
input prob
0 ; 20 ; 50 ; 20 ; 10 ; 0 ; 0 ; 0 ; 0 ; 0 )
Thanks a lot!
Carlo
Re: fitting Beta Distribution to data
There are two ways to do that in this case. One, which works for general functions, is to use FIND ROOT. If you have the lower and upper bounds and the two shape parameters (you'll have to save the latter out of %beta since those will get overwritten), you can invert the remapped %betainc with:
For the interquartile range, repeat this with .5 in the FIND replaced with .25 and .75.
In this case, you can also use the fact that the F distribution is itself a remapping of the incomplete beta, so you can use %INVFTEST to do the inversion.
Again, replace the .5 in the %INVFTEST with .25 and .75 to get the other two quantiles.
Code: Select all
nonlin median
compute lower=-6.0,upper=5.0,a=2.0,b=3.0
find root %betainc((median-lower)/(upper-lower),a,b)-.5
end findIn this case, you can also use the fact that the F distribution is itself a remapping of the incomplete beta, so you can use %INVFTEST to do the inversion.
Code: Select all
compute fmedian=%invftest(.5,2.0*b,2.0*a)
compute betamedian=a/(a+b*fmedian)
disp lower+betamedian*(upper-lower)