Statistics and Algorithms / Cross Section Data Methods /

Probit and Logit Models

You can use the instruction DDV (discrete dependent variables) to estimate probit and logit models. (The instructions PRB and LGT from earlier versions of RATS are still available). If there are only two choices, the models take the form:

\begin{equation} P\left( {Y_i = 1{\kern 1pt} {\kern 1pt} {\kern 1pt} \left| {{\kern 1pt} X_i } \right.} \right) = F\left( {X_i {\kern 1pt} \beta } \right) \end{equation}

where the dependent variable $Y$ takes either the value 0 or 1, $X$ is the set of explanatory variables and

•$F\left( z \right) = \Phi \left( z \right)$ (the standard normal cdf) for the probit

•$F\left( z \right) = \exp (z)/(1 + \exp (z))$ for the logit.

See, for example, Wooldridge (2010), section 15.3. Please note that RATS requires a numeric coding for $Y$. RATS treats any non-zero value as equivalent to “1”.

It’s very useful to look at a framework from which logit and probit models can be derived. Suppose that there is an unobservable variable $y_i *$

which measures the “utility” of choosing 1 for an individual. The individual is assumed to choose 1 if this is high enough. Thus,

\begin{equation} y_i^* = X_i \beta + u_i \end{equation}

with

\begin{equation} Y_i = \left\{ {\begin{array}{*{20}c} 1 \hfill & {{\rm{if}}\,y_i^* \ge 0} \hfill \\ 0 \hfill & {{\rm{if}}\,y_i^* < 0} \hfill \\ \end{array}} \right. \end{equation}

If the $u_i$ are assumed to be independent standard normal, the result is a probit model; if they’re independent logistic, we get a logit. (RATS also allows use of the asymmetric extreme value distribution). This type of underlying model is used to derive many of the extensions of the simple binary choice probit and logit.

Estimation

Probit and logit models require fitting by non-linear maximum likelihood methods. However, their likelihood functions are usually so well-behaved that you can simply let the program take off from the standard initial estimates (all zeros). You may, in bigger models, need to boost the ITERS option. But usually all you need is

ddv(dist=probit) depvar (use DIST=LOGIT for logit)

# list of regressors

DDV includes the standard options for computing robust covariance matrices, but keep in mind that the estimates themselves may not be consistent if the assumption about the distribution isn’t correct.

Using PRJ

The instruction PRJ has many options designed to help with diagnostics and predictions for logit and probit models. In its most basic use, after estimating a model by DDV, you can compute the series of the “index” by just

prj index

With the option CDF, you can generate the series of predicted probabilities of the “1” choice. Use the option DIST=LOGIT if you want these to be calculated for the logit or DIST=EXTREME for the extreme value—the default is to compute these for the normal, regardless of your choice on the DDV. Some additional statistics can be obtained using the options DENSITY, MILLS and DMILLS.

PRJ will also let you compute the index, density and predicted probability for a single input set of X’s. The values are returned as the variables %PRJFIT, %PRJDENSITY and %PRJCDF. The two options which allow this are XVECTOR and ATMEAN. The ATMEAN option requests that the calculation be done at the mean of the regressors over the estimation range, while with XVECTOR you provide a vector at which you want the values calculated. This example computes the “slope coefficients” for a probit, giving the derivatives of the probability with respect to the explanatory variables evaluated at the mean.

ddv(dist=probit) grade

# constant gpa tuce psi

prj(atmean,dist=probit)

disp "Slope Coefficients for probit"

disp %prjdensity*%beta

Generalized Residuals

With DDV, you can compute the generalized residuals by including the GRESIDS option. If the log likelihood element for an observation can be written $g(X_i \beta )$, the generalized residuals are the series of derivatives $g'(X_i \beta )$. This has the property that

\begin{equation} {\partial \mathord{\left/ {\vphantom {\partial {\partial \beta }}} \right. } {\partial \beta }}\sum {g(X_i \beta )} = \sum {g'(X_i \beta )} X_i = 0 \end{equation}

that is, the generalized residuals are orthogonal to the explanatory variables, the way the regular residuals are for a least squares regression. They crop up in many diagnostic tests on logit and probit models. For instance, the following (from Greene(2012), example 17.10) computes an LM test for heteroscedasticity related to the series KIDS and FAMINC. The test statistic determines whether two constructed variables are also orthogonal to the generalized residuals.

ddv(gresids=gr) lfp

# constant wa agesq faminc we kids

prj fit

set z1fit = -fit*kids

set z2fit = -fit*faminc

mcov(opgstat=lm) / gr

# %reglist() z1fit z2fit

cdf(title="LM Test of Heteroscedasticity") chisqr lm 2

Examples

PROBIT.RPF is an example of probit and logit models, including tests of restrictions.

UNION.RPF estimates a probit model and does experiments with predicted probabilities under alternative values for the explanatory variables.

Most of the general econometrics textbooks have chapters on probit and logit models. Wooldridge(2010) and Greene(2012) have more extensive treatments.

Ordered Probit and Logit

In an ordered probit model, there are three or more choices, with the possible choices having a natural ordering. A single index function combined with a partitioning of the real line is used to model the choice process. If we have $m$ choices, let $a_1$,....,$a_{m-1}$ be the upper bounds on the intervals (the top choice is unbounded above). If an individual’s index is less than $a_1$, she chooses 1; if between $a_1$ and $a_2$, she chooses 2, etc. An ordered probit occurs if the index function $I$ takes the form

\begin{equation} I_i = X_i {\kern 1pt} \beta + u_i \end{equation}

where $u_i $ is a standard normal, and an ordered logit if $u$ is a logistic. The probability that an individual chooses $j$ is

\begin{equation} P({\text{choose}}\,j|X_i ) = \left\{ {\begin{array}{*{20}c} {\Phi \left( {a_1 - X_i {\kern 1pt} \beta } \right)} \hfill & {{\rm{if}}\,j = 1} \hfill \\ {1 - \Phi \left( {a_{m - 1} - X_i {\kern 1pt} \beta } \right)} \hfill & {{\rm{if}}\,j = m} \hfill \\ {\Phi \left( {a_j - X_i {\kern 1pt} \beta } \right) - \Phi \left( {a_{j - 1} - X_i {\kern 1pt} \beta } \right)} \hfill & {{\text{otherwise}}} \hfill \\ \end{array}} \right. \end{equation}

To estimate an ordered probit using DDV, add the option TYPE=ORDERED. Do not include CONSTANT in the regressors—a non-zero intercept would simply shift the cut-points, leaving an unidentified model.

This is a part of example 16-2 from Wooldridge(2010):

ddv(type=ordered,dist=probit,cuts=cuts) pctstck

# choice age educ female black married $

finc25 finc35 finc50 finc75 finc100 finc101 wealth89 prftshr

The CUTS option returns the vector of cut points. Only the standard regression coefficients are included in the %BETA vector. Note that RATS does not require that the dependent variable have a specific set of values for an ordered probit; they just have to have a numeric coding in which each choice has a unique value, and these increase in the natural order of the choices. For instance, you could use 1,2,...,m, or 0,1,...,m–1. However, the labeling on the output for the cutpoints will be CUT(1), CUT(2),... regardless of the coding that you use.

Multinomial Logit

There are two main formulations of the logit model for multiple (unordered) choices: the multinomial logit and the conditional logit, though the dividing line between the two is sometimes a bit unclear. The common structure of these is that for each individual there are (linear) functions $f_1 ,...,f_m $ of the explanatory variables and it’s assumed that

\begin{equation} P({\rm{choose}}\,j) = \exp (f_j )/\sum\limits_{i = 1}^m {\exp (f_i )} \end{equation}

In the classic multinomial logit, the $f$ functions for individual $i$ take the form

\begin{equation} f_j = X_i \beta _j \end{equation}

that is, each choice uses the same explanatory variables, but with a different set of coefficients. With this structure, the coefficients on one of the choices can be normalized to zero, as subtracting (say) $X_i \beta _1 $ from all the $f$ values will leave the probabilities unchanged. Thus, the model needs to estimate $m-1$ sets of coefficients. To estimate a multinomial logit model, add the option TYPE=MULTINOMIAL. (RATS doesn’t do multinomial probit, so you don’t need the DIST option for logit). For instance, the following is a part of an example from Wooldridge (2010), example 15.4:

ddv(type=multinomial,smpl=smpl) status

# educ exper expersq black constant

The choices are (numerically) coded into the dependent variable. You can choose whatever scheme you want for this, as long as each choice uses a single number distinct from all other choices. For instance, you can use 0, 1, 2,... or 1, 2, 3, ... However you code it, the coefficients are normalized to be zero for the lowest numbered choice.

The alternative to the multinomial logit is the conditional logit model of McFadden (1974). Here the $f$’s take the form

\begin{equation} f_j = X_{ij} \beta \end{equation}

where $X_{ij} $ is the set of attributes of choice $j$ to individual $i$. Note that the coefficient vector is the same. The idea behind the model is that there is a common “utility function” over the attributes which is used to generate the probabilities. Theoretically, this model could be used to predict the probability of an alternative which hasn’t previously been available (a proposed transit system, for instance), as the probabilities depend only upon the attributes of the choice. If you want to do this, however, you have to be careful about the choice of variables—any choice-specific dummies would render the calculations involving new choices meaningless.

The “X” variables for the conditional logit take quite a different form from those for the multinomial logit and the other models considered earlier. To keep the overall form of the instruction similar to the rest of DDV, the data are input differently: there is one observation in the data set for each combination of an individual and choice. If, for instance, there are four possible choices and 500 individuals, the data set should have 2000 observations. Internally, DDV will compute likelihoods on an individual by individual basis, and will report the number of individuals as the number of usable observations. The dependent variable will be a zero-nonzero coded variable which is non-zero if the individual made the choice indicated by that entry. In order for DDV to calculate the likelihood, you need a separate series which distinguishes the data for the individuals. Use the INDIVS option to show what series that is. The MODES option isn’t required to estimate the model, but allows additional diagnostics; it provides the name of the series which shows which choice an entry represents.

The following is a part of an example from Greene (2012), section 18.2.9. The data set is 210 individuals, 4 choices (air, train, bus, car) for each. The data set itself doesn’t include the individual and mode identifiers, so they are constructed as a function of the entry number. This uses choice-specific dummies, which are also generated. Note that you don’t include CONSTANT, as it would wash out of the probabilities. This also estimates a separate model for the individuals who didn’t choose “air” as part of a Hausman test. In order to do this, you just need to knock out all the “air” entries using a SMPL option. If DDV finds an individual who chose none of the options in the data set being used in the estimation, that observation will be dropped from the calculations, which is what we want.

set indiv = %block(t,4)

set which = %clock(t,4)

set chosen = mode

set dair = which==1

set dtrain = which==2

set dbus = which==3

set airhinc = dair*hinc

* Conditional logit model

ddv(type=conditional,indiv=indiv,modes=which) chosen

# dair dtrain dbus gc ttme airhinc

ddv(type=conditional,indiv=indiv,modes=which,smpl=which<>1) chosen

# dair dtrain dbus gc ttme airhinc

Using MAXIMIZE

You can also use the MAXIMIZE instruction to estimate probit and logit models that don’t fit into one of the forms that can be handled by DDV. The key for doing this is the function %IF:

$\text{\% IF}\left( {x,y,z} \right) = \left\{ {\begin{array}{*{20}c}y & {{\text{if }}x \ne 0} \\z & {{\text{if }}x = 0} \\\end{array}} \right.$

If you create a FRML called ZFRML which computes the equivalent of $X_i \beta $, then the log likelihood FRMLS for probit and logit are

frml probit = (z=zfrml(t)) , %if(y,log(%cdf(z)),log(%cdf(-z)))

frml logit = (z=zfrml(t)) , %if(y,z,0.0)-log(1+exp(z))

The easiest way to create ZFRML is by doing a FRML instruction with either the LASTREG option (after doing a regression) or with the REGRESSORS option.

Samples with Repetitions

In econometric work, we occasionally have a data set where there are repetitions on each set of values for the explanatory variables. Ideally, you will have one series with the number of repetitions on the X values, and another series indicating either the total number or the fraction of “successes.” You can estimate a probit or logit for such a data set by using MAXIMIZE. If series REPS and SUCCESS are the number of repetitions and the number of successes, the log likelihood for probit is

frml probit = (z=zfrml(t)) , $

success*log(%cdf(z))+(reps-success)*log(1-%cdf(z))

If the data are limited to the fraction of successes without total counts, use

frml probit = (z=zfrml(t)) , $

fract_success*log(%cdf(z))+(1-fract_success)*log(1-%cdf(z))

Heteroscedastic Probit

This estimates a probit model allowing for possible heteroscedasticity in the residual of the “index” process. The scedastic model takes the form

\begin{equation} \exp (b_0 \times {\rm{KIDS}} + b_1 \times {\rm{FAMINC}}) \end{equation}

frml(regress,vector=bp) zfrml

# constant wa agesq faminc we kids

frml(regress,vector=bh) hfrml

# kids faminc

frml hprobit = (z=zfrml/sqrt(exp(hfrml))),$

%if(lfp,log(%cdf(z)),log(%cdf(-z)))

nonlin bp bh

maximize(pmethod=simplex,piters=10,method=bhhh) hprobit