|
Examples / REPROBIT.RPF |
REPROBIT.RPF is an example of estimation of a Random Effects probit model, that is, a probit model in a panel data set assuming random individual effects. If we use an "index" model for the decision for individual \(i\) at time period \(t\):
(1) \({V_{it}} = {X_{it}}\beta + \mu_i + {u_{it}}\)
we observe \(y_{it}=1\) if and only if \(V_{it}>0\). As with a standard probit model, we assume the \({u_{it}}\) are i.i.d. \(N(0,1)\) where fixing the variance at 1 fixes the scale of (1).
The individual effects \(\mu _i\) are assumed to be independent of each other and all the \(u\) (and \(X\)) and are \(N(0,{\sigma ^2})\), where this variance is unknown. Combining these assumptions means
(2) \(P({y_{it}} = 1) = \Phi \left( {{X_{it}}\beta + {\mu _i}} \right)\)
However, the \({\mu _i}\) are unknown, unobservable, and common to all \(it\) observations, so we can't use this in this form. Instead, we will have a single probability calculation for all the data for individual \(i\). Conditional on \({\mu _i}\) and \(\beta\), the likelihood for individual \(i\) is
(3) \(f({\mu _i},\beta ) = \prod\limits_t {\Phi \left( {{{\tilde y}_{it}}\left( {{X_{it}}\beta + {\mu _i}} \right)} \right)}\)
where, for notational convenience, we use the remapped \({{{\tilde y}_{it}}}\) which is 1 and -1 rather than 1 and 0. Note that this needs to be kept in the product form (not sum of logs). The marginal likelihood of \(\beta\) requires integrating (3) over \({\mu _i}\). Unfortunately, there is no closed form expression for this integral. However, there is a well-known procedure for approximating numerical integrals when the integral can be written
(4) \(\int\limits_{ - \infty }^\infty {f(x)\exp ( - {x^2})dx}\)
This is Gauss-Hermite integration. To evaluate the log likelihood for random effects probit, we need to use Gauss-Hermite integration to compute an approximation to the integral of (3) over \(\mu_i\). This will give the joint probability of the observed data for individual \(i\). The sum of the logs of this over individuals gives the full sample likelihood (note again that the integral is over the probabilities not log probabilities). The full sample likelihood requires summing this over individuals (\(i\), not over single data points \(it\).
The data set in this example is unbalanced, with IDCODE indicating the individual and YEAR, the specific entry:
open data union.dat
data(format=prn,org=columns) 1 26200 idcode year age grade not_smsa $
south union t0 southxt black
This uses the grouping code (IDCODE) to break the entries into the list of individuals (into VID) and, for each of those, the list of entries for that individual (into IDENTRIES).
panel(group=idcode,id=vid,identries=identries)
This estimates a standard probit model (output).
ddv union
# age grade not_smsa south southxt constant
We need to use general non-linear estimation for the RE probit model. This sets up the formula based upon the probit model (using LASTREG option). The coefficients (put in the VECTOR B) are initialized to the standard probit results. We also need to estimate the variance of the random effect, so we create a separate PARMSET for that. SIGMA is initialized to 1. (The standard probit is equivalent to RE probit with sigma=0, but that's a boundary value, so we pick a guess value that is non-zero).
frml(lastreg,parmset=probitparms,vec=b) zfrml
nonlin(parmset=sigmaparm) sigma
compute sigma=1.0
With this setup, ZFRML(IT) for a given set of coefficient will evaluate the base index \({X_{it}}\beta\).
This gets the coefficients for a 5 term Gauss-Hermite integral. More terms makes this more accurate, but considerably slows the estimation. Because we're integrating over a Normal density, this uses the STANDARDIZE option.
compute ngh=5
@GaussHermite(n=ngh,w=w,x=x,standard)
The working part of the estimation is this FUNCTION, which approximates the integral (3) for a given individual using Gauss-Hermite. This is the weighted sum across the Gauss-Hermite terms of the likelihood of the model (not log likelihood, so it's a product) , evaluated across the entries in the individual's IDENTRIES using each term's value for the random effect \(mu\).
function REIntegral i
type real REIntegral
type integer i
*
local integer h j it
local real prod z
*
compute REIntegral=0.0
do h=1,ngh
compute prod=1.0
do j=1,%size(identries(i))
compute it=identries(i)(j)
compute z=zfrml(it)+sigma*x(h)
compute prod=prod*%if(union(it)==1,%cdf(z),%cdf(-z))
end do j
compute REIntegral=REIntegral+w(h)*prod
end do h
return
end REIntegral
Using that, this maximizes the log likelihood across individuals. Note that this is not directly across all the data points, but over the individuals—the reference to individual data points is inside the REIntegral function. Note that now we convert to logs. The output shows that the individual effects make a very substantial difference in the fit.
frml reprobit = log(REIntegral(t))
maximize(method=bfgs,parmset=probitparms+sigmaparm,$
title="Random Effects Probit") reprobit 1 %size(vid)
Full Program
open data union.dat
data(format=prn,org=columns) 1 26200 idcode year age grade not_smsa $
south union t0 southxt black
*
* Data set is unbalanced. This uses the grouping code (IDCODE) to break the
* entries into the list of individuals (into VID) and, for each of those, the list
* of entries for that individual (into IDENTRIES).
*
panel(group=idcode,id=vid,identries=identries)
*
* This does the standard probit
*
ddv union
# age grade not_smsa south southxt constant
*
* Set up the formula. The standard error of the RE component also needs
* to be estimated.
*
frml(lastreg,parmset=probitparms,vec=b) zfrml
nonlin(parmset=sigmaparm) sigma
compute sigma=1.0
*
* Get coefficients for a 5 term Gauss-Hermite integral. More terms makes
* this more accurate, but considerably slows the estimation.
*
compute ngh=5
@GaussHermite(n=ngh,w=w,x=x,standard)
***************************************************************
*
* This does the integral over the data for individual "i"
*
function REIntegral i
type real REIntegral
type integer i
*
local integer h j it
local real prod z
*
compute REIntegral=0.0
do h=1,ngh
compute prod=1.0
do j=1,%size(identries(i))
compute it=identries(i)(j)
compute z=zfrml(it)+sigma*x(h)
compute prod=prod*%if(union(it)==1,%cdf(z),%cdf(-z))
end do j
compute REIntegral=REIntegral+w(h)*prod
end do h
return
end REIntegral
***************************************************************
*
* The maximize runs over count of individuals, not the actual entries.
*
frml reprobit = log(REIntegral(t))
maximize(method=bfgs,parmset=probitparms+sigmaparm,$
title="Random Effects Probit") reprobit 1 %size(vid)
Output
Binary Probit - Estimation by Newton-Raphson
Convergence in 5 Iterations. Final criterion was 0.0000000 <= 0.0000100
Dependent Variable UNION
Usable Observations 26200
Degrees of Freedom 26194
Log Likelihood -13547.3083
Average Likelihood 0.5962634
Pseudo-R^2 0.0241762
Log Likelihood(Base) -13864.2297
LR Test of Coefficients(5) 633.8428
Significance Level of LR 0.0000000
Variable Coeff Std Error T-Stat Signif
************************************************************************************
1. AGE 0.005946062 0.001579846 3.76370 0.00016742
2. GRADE 0.026390021 0.003665091 7.20037 0.00000000
3. NOT_SMSA -0.130391145 0.020252257 -6.43835 0.00000000
4. SOUTH -0.402725382 0.033989025 -11.84869 0.00000000
5. SOUTHXT 0.003308847 0.002925309 1.13111 0.25800871
6. Constant -1.113090735 0.065780767 -16.92122 0.00000000
Random Effects Probit - Estimation by BFGS
Convergence in 25 Iterations. Final criterion was 0.0000091 <= 0.0000100
Usable Observations 4434
Function Value -10654.0640
Variable Coeff Std Error T-Stat Signif
************************************************************************************
1. B(1)=AGE 0.003657343 0.002218373 1.64866 0.09921737
2. B(2)=GRADE 0.029124182 0.007985220 3.64726 0.00026505
3. B(3)=NOT_SMSA -0.159069750 0.038292115 -4.15411 0.00003266
4. B(4)=SOUTH -0.606317176 0.052772794 -11.48920 0.00000000
5. B(5)=SOUTHXT 0.014289581 0.004053430 3.52531 0.00042299
6. B(6)=Constant -1.487588237 0.125819494 -11.82319 0.00000000
7. SIGMA 1.150518586 0.017167532 67.01712 0.00000000
Copyright © 2025 Thomas A. Doan