DDV Instruction

DDV( options ) depvar start end

# list of explanatory variables (in Regression Format)

The DDV instruction performs several forms of discrete dependent variable estimation, including binary choice probit and logit, ordered choice models, and multinomial and conditional logit.

Wizard

The Statistics>Limited/Discrete Dependent Variable Wizard provides dialog-driven access to most of the features of the DDV instruction.

Parameters

depvar	Dependent variable. DDV requires numeric coding for this. The coding required will depend upon the model type. See the description under the TYPE option.
start, end	Estimation range. If you have not set a SMPL, this defaults to the maximum common range of all the variables involved.

Options

Standard Regression Options

Robust Error Options

TYPE=[BINARY]/ORDERED/MULTINOMIAL/CONDITIONAL/COUNT

Model type. BINARY is a binary choice model. ORDERED is an ordered choice model, MULTINOMIAL and CONDITIONAL are for multiple choice logits, and COUNT is for a Poisson count model. The main difference between MULTINOMIAL and CONDITIONAL is that the former is designed for fixed individual characteristics, with different coefficients for the choices, and the latter is for analyzing differing attributes across choices, with a fixed set of coefficients. See "Probit and Logit Models" for more information.

For BINARY, one choice is represented (in depvar) by zero, and the other by any non-zero value or values. For ORDERED, each choice must have a distinct value, ordered systematically, but they don’t have to have any specific set of values. For MULTINOMIAL each choice again must have a distinct value. COUNT data are just non-negative integer values.

CONDITIONAL requires a data set with a completely different design. Although there are multiple choices, the dependent variable is treated as a binary choice (zero vs non-zero), as each available option for each individual has a separate observation in the data set. The dependent variable is used to indicate whether the particular choice represented by an observation was the one chosen.

DISTRIBUTION=[PROBIT]/LOGIT/EXTREMEVALUE

The distribution of the underlying “index” model. PROBIT is the standard normal. This matters only for BINARY and ORDERED; MULTINOMIAL and CONDITIONAL are always done as logit, and COUNT uses a Poisson model.

INDIV=SERIES identifying individuals (for CONDITIONAL only)

MODES=SERIES identifying modes (choices) (for CONDITIONAL only)

For TYPE=CONDITIONAL, the INDIV option is required to indicate which observations are for each individual. The data set is structured differently, as each individual and each choice available for the individual needs a separate observation in the data set. The MODES series is optional; it provides an identifying series for the choices. This might, for instance, have the coding 1=Car, 2=Train, 3=Air. This isn’t necessary to estimate the model, but does allow for more diagnostics.

CUTPOINTS=(output) VECTOR of estimated cut points (output from ORDERED only)

If you have m choices, this is an m–1 vector, with the first element showing the cut point between the choice with the smallest value and the second smallest, the second separated the second and third choices, etc.

COEFFS=RECTANGULAR matrix of coefficients (TYPE=MULTINOMIAL only)

For TYPE=MULTINOMIAL, for m choices and K regressors, this saves the coefficients as a K by (m-1) matrix. The first choice is normalized to the zero vector and left out.

GRESIDS=SERIES of generalized residuals

If the log likelihood element for an observation can be written \(g({X_t}\beta )\), the generalized residuals are the series of derivatives \(g'({X_t}\beta )\). These are useful for diagnostic tests.

ITERATION=iteration limit[100]

SUBITERATIONS=subiteration limit [30]

CVCRIT=convergence limit [.00001]

TRACE/[NOTRACE]

INITIAL=VECTOR of initial guesses[vector of zeros]

DDV estimates using non-linear methods. Because the models have well-behaved log likelihood functions, you don’t have (or need) as many choices to control this as with, for instance, MAXIMIZE.

All models except the conditional logit are estimated using Newton-Raphson, that is, they use analytical second derivatives. Conditional logit uses BHHH. See Hill Climbing Algorithms for details. ITERATIONS sets the maximum number of iterations, SUBITERS sets the maximum number of subiterations, CVCRIT the convergence criterion. TRACE prints the intermediate results. INITIAL supplies initial estimates for the coefficients. The default values are usually sufficient.

EQUATION=Equation to estimate [unused]

WEIGHT=Standard WEIGHT option [unused]

Use this option if you want to provide different weights for each observation.

Description

The three distributions which you can select for the binary and ordered models are the standard normal (probit), logistic (logit) and extreme value. The distribution function for the logistic is

(1) \(F\left( z \right) = \exp \left( z \right)/\left( {1 + \exp \left( z \right)} \right){\kern 1pt} \)

and for the extreme value it is

(2) \(F(z) = 1 - \exp ( - \exp (z))\)

The extreme value distribution isn’t symmetric like the other two: in this form, it is skewed towards negative values.

For TYPE=BINARY, DDV estimates the coefficients by maximizing over \(\beta\) the log likelihood function

(3) \(L = \sum\limits_{{Y_i} \ne 0} {\log \left( {F\left( {{X_i}\beta } \right)} \right)} \,\,\,{\kern 1pt} + {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \sum\limits_{{Y_i} = 0} {\log \left( {1 - F\left( {{X_i}\beta } \right)} \right)} \)

As mentioned above, this is done using the Newton-Raphson algorithm, as the first and second derivatives are easy to compute.

The estimated covariance matrix of coefficients is

(4) \({\left( { - \frac{{{\partial ^2}L}}{{\partial {\beta ^2}}}} \right)^{ - 1}}\) evaluated at \(\hat \beta \).

TYPE=ORDERED estimates m–1 cut points in addition to the regressor coefficients. The dependent variable can be coded as any increasing set of numbers; within DDV these get mapped to 1,...,m. Don’t include CONSTANT among your regressors; it would merely shift the cut points. The likelihood elements are generated from:

(5) \(P({\rm{choose}}\,j|{{\bf{X}}_i}) = \left\{ {\begin{array}{*{20}{c}} {F\left( {{a_1} - {{{\bf{X'}}}_i}{\kern 1pt} \beta } \right)} \hfill & {{\rm{if}}\,j = 1} \hfill \\{1 - F\left( {{a_{m - 1}} - {{{\bf{X'}}}_i}{\kern 1pt} \beta } \right)} \hfill & {{\rm{if}}\,j = m} \hfill \\{F\left( {{a_j} - {{{\bf{X'}}}_i}{\kern 1pt} \beta } \right) - F\left( {{a_{j - 1}} - {{{\bf{X'}}}_i}{\kern 1pt} \beta } \right)} \hfill & {{\rm{otherwise}}} \hfill \\ \end{array}} \right.\)

Again, the parameters are estimated by Newton-Raphson.

TYPE=MULTINOMIAL will always use the logit model. This estimates a separate set of coefficients for the m–1 choices with the highest values; the model is normalized on the choice with the lowest numerical value.

TYPE=CONDITIONAL will also use the logit model. As indicated above, you have to structure your data set with one observation for each combination of individual and choice. That is, if you have 400 individuals and 4 choices for each, your data set will have 1600 observations. A series identifying the individuals is required, and one identifying the choices is recommended: use the INDIV and MODES options to tell DDV which series they are. The depvar series shows whether an observation represents the choice made by the individual. Note that if a choice is unavailable to an individual, you don’t need to include an observation for it in the data set.

DDV calculates the likelihood elements individual by individual. If none of the observations are flagged (in depvar) as chosen, the individual will be skipped. You can use this to estimate a model over a restricted set of choices: if you use a SMPL which knocks out any observation for a particular mode, it will remove from consideration any individual who chose that mode.

TYPE=COUNT is the one model within this instruction which doesn’t model “choice.” Instead, it’s used for models of “count” data, where the dependent variable is required to be a non-negative integer, usually a small one. DDV estimates a Poisson regression model (see, for instance, Wooldridge(2010), section 18.2), estimating the log of the mean of the Poisson distribution as a linear function of the regressors. The likelihood for an individual is given by

(6) \(\exp \left( { - \exp ({X_i}\beta )} \right)\exp ({y_i} \times {X_i}\beta )\)

(This ignores the factorial term, which doesn’t interact with the coefficients). This is the only model type here where the estimates have some robustness to deviations from the distribution chosen, so using ROBUSTERRORS to correct the covariance matrix is reasonable.

Variables Defined

Nonlinear Estimation Variables

%BETA	coefficient vector (VECTOR)
%BETASYS	coefficient vector, includes all estimated parameters, not just right-side terms (VECTOR)
%CVCRIT	final convergence criterion (REAL). This will be equal to zero if the sub-iteration limit was reached on the last iteration.
%LOGL	log likelihood value (REAL)
%NFREE	number of free parameters (INTEGER)
%NOBS	number of observations (INTEGER)
%NREG	number of regressors (INTEGER)
%RSQUARED	pseudo R-squared measure given above (REAL)
%SEESQ	maximum likelihood estimate of the regression variance for TYPE=COUNT (REAL)
%STDERRS	vector of coefficient standard errors (VECTOR)
%TSTATS	vector of t-statistics of the coefficients (VECTOR)
%XX	inverse(X´X) matrix (SYMMETRIC)
%XXSYS	System covariance matrix, includes all estimated terms (SYMMETRIC)

Examples

This estimates a probit model, and does a Wald test for excluding some of the explanatory variables. (DIST=PROBIT is the default, so the option on the DDV isn't really necessary).

ddv(dist=probit) yesvm

# constant public1_2 public3_4 public5 private years teacher loginc logproptax

* Test whether "Children in School" dummies are significant.

* Use "Wald" test first.

exclude(title="Wald Test of Children in School Dummies")

# public1_2 public3_4 public5 private

This estimates a Poisson count model.

* A "static" Poisson model can be estimated by using DDV (Discrete

* Dependent Variables) with TYPE=COUNT. With a fixed probability, the

* only explanatory variable is the constant. Note that this shows the

* overdispersion by the standardized variance, which should be 1 if a

* Poisson is actually correct.

ddv(type=count) scores

# constant

This does a "Chow" likelihood ratio test for a sample split in a probit model (between the subsamples with KIDS and without). It then does an LM test for heteroscedasticity using the generalized residuals.

ddv lfp

# constant wa agesq faminc we

compute logltot=%logl

ddv(smpl=(kids==0)) lfp

# constant wa agesq faminc we

compute logl0=%logl

ddv(smpl=(kids>0)) lfp

# constant wa agesq faminc we

compute logl1=%logl

cdf(title="Test of Sample Split Based upon Kids") chisqr 2*(logl0+logl1-logltot) 5

* LM test for heteroscedasticity

ddv(gresids=gr) lfp

# constant wa agesq faminc we kids

prj fit

set z1fit = -fit*kids

set z2fit = -fit*faminc

mcov(opgstat=lm) / gr

# %reglist() z1fit z2fit

cdf(title="LM Test of Heteroscedasticity") chisqr lm 2

Sample Output

TYPE=COUNT is similar to a linear regression, and has similar output. The rest all model probabilities. The following statistics are included in the output ((\(N\) is the number of individuals, \(y_i\) is the observed value):

Log Likelihood	\(\log L \equiv \sum\limits_{i = 1}^N {\log P\left( {{Y_i} = {y_i}{\kern 1pt} \left\| {{\kern 1pt} {X_i}} \right.} \right)} \)
Average Likelihood	\(\exp \left( {\frac{1}{N}\sum\limits_{i = 1}^N {\log P\left( {{Y_i} = {y_i}{\kern 1pt} \left\| {{\kern 1pt} {X_i}} \right.} \right)} } \right)\)
Base Likelihood	\(\log {L_c} \equiv \sum\limits_{j = 1}^m {N{{\hat p}_j}\log ({{\hat p}_j})} \)
Pseudo-\(R^2\)	\(1 - {\left( {\log \,L/\log \,{L_c}} \right)^{ - (2/N)\log \,{L_c}}}\)

The average likelihood is the geometric mean of the likelihood elements. The base likelihood is the log likelihood of a model which predicts the observed probabilities of each choices; in most cases, this is the maximum of the likelihood without any slope coefficients. The Pseudo-\(R^2\) is the measure of fit from Estrella (1998).

This is the output from a probit.

Binary Probit - Estimation by Newton-Raphson

Convergence in 4 Iterations. Final criterion was 0.0000000 <= 0.0000100

Dependent Variable LFP

Usable Observations 753

Degrees of Freedom 748

Log Likelihood -496.8663

Average Likelihood 0.5169294

Pseudo-R^2 0.0475174

Log Likelihood(Base) -514.8732

LR Test of Coefficients(4) 36.0138

Significance Level of LR 0.0000003

Variable Coeff Std Error T-Stat Signif

************************************************************************************

1. Constant -3.686054626 1.387264410 -2.65707 0.00788238

2. WA 0.133734096 0.063852182 2.09443 0.03622145

3. AGESQ -0.001662405 0.000736445 -2.25734 0.02398707

4. FAMINC 0.035746294 0.041578006 0.85974 0.38993213

5. WE 0.098433285 0.022801032 4.31705 0.00001581