MAXIMIZE.RPF

MAXIMIZE.RPF is an example of the use of MAXIMIZE for a stochastic frontier model. This is adapted from Greene, Econometric Analysis, 6th Edition, example 16.9, estimating a stochastic frontier model. This is basically a log-linear production model, except that the residuals have two components, one of which is not permitted to be positive—you can fall short of, but can’t exceed, the (unobservable) production function. The model is written:
\begin{equation} y_t = X_t \beta + v_t - u_t ,u_t \ge 0,v_t \sim N\left( {0,\sigma _v^2 } \right) \end{equation}
Without the $u_t$, this is a conventional model which can be estimated by least squares. Under the assumption that $u_t$ is “half-Normal”, that is, $u_t \sim N\left( {0,\sigma _u^2 } \right) \times I_{\left[ {0,\infty } \right)} $, the log likelihood for entry $t$ can be written most conveniently as

\begin{equation} \log f_N \left( {\varepsilon |\sigma ^2 } \right) + \log 2F_N \left( { - \lambda \varepsilon |\sigma ^2 } \right) \end{equation}
where

$\varepsilon _t = y_t - X_t \beta $, $\sigma ^2 = \sigma _v^2 + \sigma _u^2 $ and $\lambda = \frac{{\sigma _u^2 }}{{\sigma _v^2 + \sigma _u^2 }}$

and $f_N$ and $F_N$ are the density and distribution functions for the Normal with the given variance. The 2 multiplier attached to the $F_N$ adjusts for the $u_t$ being supported on just half the real line, thus requiring doubling of the integrating constant. If $\sigma _u^2 = 0$, then $\sigma ^2 = \sigma _v^2 $ and thus $\lambda = 0$, so this collapses (as it should) to the least squares log likelihood.

The Normal density and distribution functions are provided as %DENSITY and %CDF, though the more convenient function for the density for likelihood calculations is %LOGDENSITY which takes the form %LOGDENSITY(VARIANCE,X). Note that all density functions in RATS include their integrating constants.

We need to estimate the parameters in the regression function, plus $\lambda$ and $\sigma$. If we want, we can always solve out for the component variances from those. An obvious source for guess values is the linear regression. However, $\lambda=0$ is on the boundary, so we need to start that away from 0 (in the example, we start it at .1).

The most important part of the program is:

linreg logq

# constant logk logl

nonlin b0 bk bl sigma lambda

compute b0=%beta(1),bk=%beta(2),bl=%beta(3)

compute sigma=sqrt(%seesq),lambda=.1

frml resid = logq-b0-bk*logk-bl*logl

frml frontier = eps=resid,$

%logdensity(sigma^2,eps)+log(2*%cdf(-eps*lambda/sigma))

maximize(method=bfgs,title=$

"Frontier Model with 1/2 Normal Errors") frontier

This estimates the linear regression, sets up the parameter set using NONLIN, and uses COMPUTE to set the guess values based upon the results of the linear regression. It then defines the formula RESID for the residual $varepsilon$, and uses that to define the log likelihood formula FRONTIER as the implementation of (2). MAXIMIZE is used to maximize the log likelihood of the model, as it takes as its optimand the sum of FRONTIER over the sample. Note the use of the TITLE option on MAXIMIZE to describe the model in the output.

The example also includes estimation of an alternative model which assumes $u_t$ is exponential rather than half-Normal. The exponential distribution is naturally defined as only non-negative, and has fatter tails than a half-Normal. Just for illustration, the MAXIMIZE on the second form uses different options:

maximize(pmethod=simplex,piters=5,method=bhhh,$

title="Frontier Model with Exponential Errors") frontier

This uses BHHH as the main estimation algorithm, rather than BFGS used for the half-Normal version and uses some preliminary simplex iterations. BHHH is a valid technique for this, since it is maximizing the log likelihood, but BFGS should generally be preferred as it has better numerical properties.

Full Program

open data tablef14-1.txt
data(format=prn,org=columns) 1 25 valueadd capital labor nfirm
*
set logq = log(valueadd)
set logk = log(capital)
set logl = log(labor)
*
* Estimate linear regression to provide guess values.
*
linreg logq
# constant logk logl
*
* Half-normal model
*
nonlin b0 bk bl sigma lambda
compute b0=%beta(1),bk=%beta(2),bl=%beta(3)
compute sigma=sqrt(%seesq),lambda=.1
frml resid = logq-b0-bk*logk-bl*logl
frml frontier = eps=resid,$
   %logdensity(sigma^2,eps)+log(2*%cdf(-eps*lambda/sigma))
maximize(method=bfgs,title=$
"Frontier Model with 1/2 Normal Errors") frontier
*
* Exponential model
*
nonlin b0 bk bl theta sigmav
compute sigmav=sigma,theta=7
frml frontier = eps=resid,log(theta)+.5*theta^2*sigmav^2+theta*eps+$
   log(%cdf(-eps/sigmav-theta*sigmav))
maximize(pmethod=simplex,piters=5,method=bhhh,$
   title="Frontier Model with Exponential Errors") frontier

Output

Linear Regression - Estimation by Least Squares

Dependent Variable LOGQ

Usable Observations 25

Degrees of Freedom 22

Centered R^2 0.9730750

R-Bar^2 0.9706273

Uncentered R^2 0.9986265

Mean of Dependent Variable 5.8120922044

Std Error of Dependent Variable 1.3753035142

Standard Error of Estimate 0.2357058986

Sum of Squared Residuals 1.2222599535

Regression F(2,22) 397.5427

Significance Level of F 0.0000000

Log Likelihood 2.2537

Durbin-Watson Statistic 1.9575

Variable Coeff Std Error T-Stat Signif

************************************************************************************

1. Constant 1.8444157136 0.2335928490 7.89586 0.00000007

2. LOGK 0.2454280713 0.1068574320 2.29678 0.03152246

3. LOGL 0.8051829551 0.1263336077 6.37347 0.00000206

Frontier Model with 1/2 Normal Errors - Estimation by BFGS

Convergence in 38 Iterations. Final criterion was 0.0000009 <= 0.0000100

Usable Observations 25

Function Value 2.4695

Variable Coeff Std Error T-Stat Signif

************************************************************************************

1. B0 2.0811754193 0.2673172557 7.78541 0.00000000

2. BK 0.2585688024 0.0964775594 2.68009 0.00736018

3. BL 0.7802185941 0.1164543555 6.69978 0.00000000

4. SIGMA 0.2823970009 0.0713040926 3.96046 0.00007481

5. LAMBDA 1.2645107510 0.9796469396 1.29078 0.19677924

Frontier Model with Exponential Errors - Estimation by BHHH

Convergence in 20 Iterations. Final criterion was 0.0000069 <= 0.0000100

Usable Observations 25

Function Value 2.8605

Variable Coeff Std Error T-Stat Signif

************************************************************************************

1. B0 2.0692489998 0.2900073623 7.13516 0.00000000

2. BK 0.2624999575 0.1201913475 2.18402 0.02896099

3. BL 0.7703660377 0.1380181186 5.58163 0.00000002

4. THETA 7.3982369676 3.9305963957 1.88222 0.05980650

5. SIGMAV 0.1713872403 0.0540563964 3.17053 0.00152163