REPROBIT.RPF

REPROBIT.RPF is an example of estimation of a Random Effects probit model, that is, a probit model in a panel data set assuming random individual effects. If we use an "index" model for the decision for individual $i$ at time period $t$:

(1) ${V_{it}} = {X_{it}}\beta + \mu_i + {u_{it}}$

we observe $y_{it}=1$ if and only if $V_{it}>0$. As with a standard probit model, we assume the ${u_{it}}$ are i.i.d. $N(0,1)$ where fixing the variance at 1 fixes the scale of (1).

The individual effects $\mu _i$ are assumed to be independent of each other and all the $u$ (and $X$) and are $N(0,{\sigma ^2})$, where this variance is unknown. Combining these assumptions means

(2) $P({y_{it}} = 1) = \Phi \left( {{X_{it}}\beta + {\mu _i}} \right)$

However, the ${\mu _i}$ are unknown, unobservable, and common to all $it$ observations, so we can't use this in this form. Instead, we will have a single probability calculation for all the data for individual $i$. Conditional on ${\mu _i}$ and $\beta$, the likelihood for individual $i$ is

(3) $f({\mu _i},\beta ) = \prod\limits_t {\Phi \left( {{{\tilde y}_{it}}\left( {{X_{it}}\beta + {\mu _i}} \right)} \right)}$

where, for notational convenience, we use the remapped ${{{\tilde y}_{it}}}$ which is 1 and -1 rather than 1 and 0. Note that this needs to be kept in the product form (not sum of logs). The marginal likelihood of $\beta$ requires integrating (3) over ${\mu _i}$. Unfortunately, there is no closed form expression for this integral. However, there is a well-known procedure for approximating numerical integrals when the integral can be written

(4) $\int\limits_{ - \infty }^\infty {f(x)\exp ( - {x^2})dx}$

This is Gauss-Hermite integration. To evaluate the log likelihood for random effects probit, we need to use Gauss-Hermite integration to compute an approximation to the integral of (3) over $\mu_i$. This will give the joint probability of the observed data for individual $i$. The sum of the logs of this over individuals gives the full sample likelihood (note again that the integral is over the probabilities not log probabilities). The full sample likelihood requires summing this over individuals ($i$, not over single data points $it$.

The data set in this example is unbalanced, with IDCODE indicating the individual and YEAR, the specific entry:

open data union.dat
data(format=prn,org=columns) 1 26200 idcode year age grade not_smsa $
south union t0 southxt black

This uses the grouping code (IDCODE) to break the entries into the list of individuals (into VID) and, for each of those, the list of entries for that individual (into IDENTRIES).

panel(group=idcode,id=vid,identries=identries)

This estimates a standard probit model (output).

ddv union

# age grade not_smsa south southxt constant

We need to use general non-linear estimation for the RE probit model. This sets up the formula based upon the probit model (using LASTREG option). The coefficients (put in the VECTOR B) are initialized to the standard probit results. We also need to estimate the variance of the random effect, so we create a separate PARMSET for that. SIGMA is initialized to 1. (The standard probit is equivalent to RE probit with sigma=0, but that's a boundary value, so we pick a guess value that is non-zero).

frml(lastreg,parmset=probitparms,vec=b) zfrml

nonlin(parmset=sigmaparm) sigma

compute sigma=1.0

With this setup, ZFRML(IT) for a given set of coefficient will evaluate the base index ${X_{it}}\beta$.

This gets the coefficients for a 5 term Gauss-Hermite integral. More terms makes this more accurate, but considerably slows the estimation. Because we're integrating over a Normal density, this uses the STANDARDIZE option.

compute ngh=5

@GaussHermite(n=ngh,w=w,x=x,standard)

The working part of the estimation is this FUNCTION, which approximates the integral (3) for a given individual using Gauss-Hermite. This is the weighted sum across the Gauss-Hermite terms of the likelihood of the model (not log likelihood, so it's a product) , evaluated across the entries in the individual's IDENTRIES using each term's value for the random effect $mu$.

function REIntegral i

type real REIntegral

type integer i

local integer h j it

local real prod z

compute REIntegral=0.0

do h=1,ngh

compute prod=1.0

do j=1,%size(identries(i))

compute it=identries(i)(j)

compute z=zfrml(it)+sigma*x(h)

compute prod=prod*%if(union(it)==1,%cdf(z),%cdf(-z))

end do j

compute REIntegral=REIntegral+w(h)*prod

end do h

return

end REIntegral

Using that, this maximizes the log likelihood across individuals. Note that this is not directly across all the data points, but over the individuals—the reference to individual data points is inside the REIntegral function. Note that now we convert to logs. The output shows that the individual effects make a very substantial difference in the fit.

frml reprobit = log(REIntegral(t))

maximize(method=bfgs,parmset=probitparms+sigmaparm,$

title="Random Effects Probit") reprobit 1 %size(vid)

Full Program

open data union.dat

data(format=prn,org=columns) 1 26200 idcode year age grade not_smsa $

south union t0 southxt black

* Data set is unbalanced. This uses the grouping code (IDCODE) to break the

* entries into the list of individuals (into VID) and, for each of those, the list

* of entries for that individual (into IDENTRIES).

panel(group=idcode,id=vid,identries=identries)

* This does the standard probit

ddv union

# age grade not_smsa south southxt constant

* Set up the formula. The standard error of the RE component also needs

* to be estimated.

frml(lastreg,parmset=probitparms,vec=b) zfrml

nonlin(parmset=sigmaparm) sigma

compute sigma=1.0

* Get coefficients for a 5 term Gauss-Hermite integral. More terms makes

* this more accurate, but considerably slows the estimation.

compute ngh=5

@GaussHermite(n=ngh,w=w,x=x,standard)

***************************************************************

* This does the integral over the data for individual "i"

function REIntegral i

type real REIntegral

type integer i

local integer h j it

local real prod z

compute REIntegral=0.0

do h=1,ngh

compute prod=1.0

do j=1,%size(identries(i))

compute it=identries(i)(j)

compute z=zfrml(it)+sigma*x(h)

compute prod=prod*%if(union(it)==1,%cdf(z),%cdf(-z))

end do j

compute REIntegral=REIntegral+w(h)*prod

end do h

return

end REIntegral

***************************************************************

* The maximize runs over count of individuals, not the actual entries.

frml reprobit = log(REIntegral(t))

maximize(method=bfgs,parmset=probitparms+sigmaparm,$

title="Random Effects Probit") reprobit 1 %size(vid)

Output

Binary Probit - Estimation by Newton-Raphson

Convergence in 5 Iterations. Final criterion was 0.0000000 <= 0.0000100

Dependent Variable UNION

Usable Observations 26200

Degrees of Freedom 26194

Log Likelihood -13547.3083

Average Likelihood 0.5962634

Pseudo-R^2 0.0241762

Log Likelihood(Base) -13864.2297

LR Test of Coefficients(5) 633.8428

Significance Level of LR 0.0000000

Variable Coeff Std Error T-Stat Signif

************************************************************************************

1. AGE 0.005946062 0.001579846 3.76370 0.00016742

2. GRADE 0.026390021 0.003665091 7.20037 0.00000000

3. NOT_SMSA -0.130391145 0.020252257 -6.43835 0.00000000

4. SOUTH -0.402725382 0.033989025 -11.84869 0.00000000

5. SOUTHXT 0.003308847 0.002925309 1.13111 0.25800871

6. Constant -1.113090735 0.065780767 -16.92122 0.00000000

Random Effects Probit - Estimation by BFGS

Convergence in 25 Iterations. Final criterion was 0.0000091 <= 0.0000100

Usable Observations 4434

Function Value -10654.0640

Variable Coeff Std Error T-Stat Signif

************************************************************************************

1. B(1)=AGE 0.003657343 0.002218373 1.64866 0.09921737

2. B(2)=GRADE 0.029124182 0.007985220 3.64726 0.00026505

3. B(3)=NOT_SMSA -0.159069750 0.038292115 -4.15411 0.00003266

4. B(4)=SOUTH -0.606317176 0.052772794 -11.48920 0.00000000

5. B(5)=SOUTHXT 0.014289581 0.004053430 3.52531 0.00042299

6. B(6)=Constant -1.487588237 0.125819494 -11.82319 0.00000000

7. SIGMA 1.150518586 0.017167532 67.01712 0.00000000