Examples / MAXIMIZE.RPF |
MAXIMIZE.RPF is an example of the use of MAXIMIZE for a stochastic frontier model. From the User's Guide, Section 4.11. This is adapted from Greene, Econometric Analysis, 6th Edition, example 16.9, estimating a stochastic frontier model. This is basically a log-linear production model, except that the residuals have two components, one of which is not permitted to be positive—you can fall short of, but can’t exceed, the (unobservable) production function. The model is written:
(1) \(y_t = X_t \beta + v_t - u_t ,u_t \ge 0,v_t \sim N\left( {0,\sigma _v^2 } \right)\)
Without the \(u_t\), this is a conventional model which can be estimated by least squares. Under the assumption that \(u_t\) is “half-Normal”, that is, \(u_t \sim N\left( {0,\sigma _u^2 } \right) \times I_{\left[ {0,\infty } \right)} \), the log likelihood for entry \(t\) can be written most conveniently as
(2) \(\log f_N \left( {\varepsilon |\sigma ^2 } \right) + \log 2F_N \left( { - \lambda \varepsilon |\sigma ^2 } \right)\)
where
\(\varepsilon _t = y_t - X_t \beta \), \(\sigma ^2 = \sigma _v^2 + \sigma _u^2 \), and \(\lambda = \frac{{\sigma _u^2 }}{{\sigma _v^2 + \sigma _u^2 }}\)
and \(f_N\) and \(F_N\) are the density and distribution functions for the Normal with the given variance. The 2 multiplier attached to the \(F_N\) adjusts for the \(u_t\) being supported on just half the real line, thus requiring doubling of the integrating constant. If \(\sigma _u^2 = 0\), then \(\sigma ^2 = \sigma _v^2 \) and thus \(\lambda = 0\), so this collapses (as it should) to the least squares log likelihood.
The Normal density and distribution functions are provided as %DENSITY and %CDF, though the more convenient function for the density for likelihood calculations is %LOGDENSITY which takes the form %LOGDENSITY(VARIANCE,X). Note that all density functions in RATS include their integrating constants.
We need to estimate the parameters in the regression function, plus \(\lambda\) and \(\sigma\). If we want, we can always solve out for the component variances from those. An obvious source for guess values is the linear regression. However, \(\lambda=0\) is on the boundary, so we need to start that away from 0 (in the example, we start it at .1).
The most important part of the program is:
linreg logq
# constant logk logl
nonlin b0 bk bl sigma lambda
compute b0=%beta(1),bk=%beta(2),bl=%beta(3)
compute sigma=sqrt(%seesq),lambda=.1
frml resid = logq-b0-bk*logk-bl*logl
frml frontier = eps=resid,$
%logdensity(sigma^2,eps)+log(2*%cdf(-eps*lambda/sigma))
maximize(method=bfgs,title=$
"Frontier Model with 1/2 Normal Errors") frontier
This estimates the linear regression, sets up the parameter set using NONLIN, and uses COMPUTE to set the guess values based upon the results of the linear regression. It then defines the formula RESID for the residual \(varepsilon\), and uses that to define the log likelihood formula FRONTIER as the implementation of (2). MAXIMIZE is used to maximize the log likelihood of the model, as it takes as its optimand the sum of FRONTIER over the sample. Note the use of the TITLE option on MAXIMIZE to describe the model in the output.
The example also includes estimation of an alternative model which assumes \(u_t\) is exponential rather than half-Normal. The exponential distribution is naturally defined as only non-negative, and has fatter tails than a half-Normal. Just for illustration, the MAXIMIZE on the second form uses different options:
maximize(pmethod=simplex,piters=5,method=bhhh,$
title="Frontier Model with Exponential Errors") frontier
This uses BHHH as the main estimation algorithm, rather than BFGS used for the half-Normal version and uses some preliminary simplex iterations. BHHH is a valid technique for this, since it is maximizing the log likelihood, but BFGS should generally be preferred as it has better numerical properties.
Full Program
open data tablef14-1.txt
data(format=prn,org=columns) 1 25 valueadd capital labor nfirm
*
set logq = log(valueadd)
set logk = log(capital)
set logl = log(labor)
*
* Estimate linear regression to provide guess values.
*
linreg logq
# constant logk logl
*
* Half-normal model
*
nonlin b0 bk bl sigma lambda
compute b0=%beta(1),bk=%beta(2),bl=%beta(3)
compute sigma=sqrt(%seesq),lambda=.1
frml resid = logq-b0-bk*logk-bl*logl
frml frontier = eps=resid,$
%logdensity(sigma^2,eps)+log(2*%cdf(-eps*lambda/sigma))
maximize(method=bfgs,title=$
"Frontier Model with 1/2 Normal Errors") frontier
*
* Exponential model
*
nonlin b0 bk bl theta sigmav
compute sigmav=sigma,theta=7
frml frontier = eps=resid,log(theta)+.5*theta^2*sigmav^2+theta*eps+$
log(%cdf(-eps/sigmav-theta*sigmav))
maximize(pmethod=simplex,piters=5,method=bhhh,$
title="Frontier Model with Exponential Errors") frontier
Output
Linear Regression - Estimation by Least Squares
Dependent Variable LOGQ
Usable Observations 25
Degrees of Freedom 22
Centered R^2 0.9730750
R-Bar^2 0.9706273
Uncentered R^2 0.9986265
Mean of Dependent Variable 5.8120922044
Std Error of Dependent Variable 1.3753035142
Standard Error of Estimate 0.2357058986
Sum of Squared Residuals 1.2222599535
Regression F(2,22) 397.5427
Significance Level of F 0.0000000
Log Likelihood 2.2537
Durbin-Watson Statistic 1.9575
Variable Coeff Std Error T-Stat Signif
************************************************************************************
1. Constant 1.8444157136 0.2335928490 7.89586 0.00000007
2. LOGK 0.2454280713 0.1068574320 2.29678 0.03152246
3. LOGL 0.8051829551 0.1263336077 6.37347 0.00000206
Frontier Model with 1/2 Normal Errors - Estimation by BFGS
Convergence in 38 Iterations. Final criterion was 0.0000009 <= 0.0000100
Usable Observations 25
Function Value 2.4695
Variable Coeff Std Error T-Stat Signif
************************************************************************************
1. B0 2.0811754193 0.2673172557 7.78541 0.00000000
2. BK 0.2585688024 0.0964775594 2.68009 0.00736018
3. BL 0.7802185941 0.1164543555 6.69978 0.00000000
4. SIGMA 0.2823970009 0.0713040926 3.96046 0.00007481
5. LAMBDA 1.2645107510 0.9796469396 1.29078 0.19677924
Frontier Model with Exponential Errors - Estimation by BHHH
Convergence in 20 Iterations. Final criterion was 0.0000069 <= 0.0000100
Usable Observations 25
Function Value 2.8605
Variable Coeff Std Error T-Stat Signif
************************************************************************************
1. B0 2.0692489998 0.2900073623 7.13516 0.00000000
2. BK 0.2624999575 0.1201913475 2.18402 0.02896099
3. BL 0.7703660377 0.1380181186 5.58163 0.00000002
4. THETA 7.3982369676 3.9305963957 1.88222 0.05980650
5. SIGMAV 0.1713872403 0.0540563964 3.17053 0.00152163
Copyright © 2025 Thomas A. Doan