RATS 10.1
RATS 10.1

BOXCOX.RPF is an example of estimation of a model with a Box-Cox transformation on the dependent variable.

 

Box-Cox models (Box and Cox (1964)) can take several forms. The only one which requires use of MAXIMIZE rather than NLLS has the dependent variable subject to a Box-Cox transformation with unknown parameter as you need to take into account the Jacobian of the transformation. A common type of model has the relationship between y and x being linear when both variables are subject to the same Box-Cox transformation: this includes as special cases, a linear relationship and a log-linear relationship:
 

(1) \(y_t^{\left( \lambda  \right)}  = b_0  + b_1 x_t^{\left( \lambda  \right)}  + u_t \)

 

In RATS, you can compute the Box-Cox class of transformations (typically denoted as \(y_{}^{\left( \lambda  \right)} \)) using the function %BOXCOX(x,lambda).

 

The example generates data from a square root on square root relationship:

 

all 200

set x = %rangamma(5.0)

set sqry = .5 + 3*sqrt(x) + %ran(4.0)

set y = sqry^2
 

(%RANGAMMA is used in generating x because we need x to have positive values).

 

With

 

(2) \(y^{\left( \lambda  \right)}  = \frac{{(y^\lambda   - 1)}}{\lambda }\)
 

we have

 

(3) \(\frac{{dy^{\left( \lambda  \right)} }}{{dy}} = y^{\left( {\lambda  - 1} \right)} \)


so we need to adjust the log likelihood in (1) by adding (for each entry)

 

\(\log y^{\left( {\lambda  - 1} \right)}  = \left( {\lambda  - 1} \right)\log y\)

 

The four free parameters are \(\lambda\), the linear parameters in (1) and the variance of \(u\) assuming those are Normally distributed (we will actually estimate the standard deviation of \(u\). Using a separate FRML for the right side of (1), the log likelihood can be computed using the BOXCOX FRML defined by:

 

nonlin b0 b1 lambda sigma

frml rhsfrml  = b0+b1*%boxcox(x,lambda)

frml boxcox   = (lambda-1)*log(y) + %logdensity(sigma^2,%boxcox(y,lambda)-rhsfrml(t))

 

For guess values, this takes the linear regression of y on x. The guess value for the intercept has to be adjusted since the Box-Cox transformation has the -1 in each term, which doesn't affect the slope, but does affect the intercept.


linreg y

# constant x

compute sigma=sqrt(%seesq),b0=%beta(1)+%beta(2)-1

compute b1=%beta(2),lambda=1.0

 

The Box-Cox model is estimated using MAXIMIZE:

 

maximize(method=bfgs) boxcox

 

Because the example uses randomly chosen data, the results will be different when run, but this is an example of the output:

 

MAXIMIZE - Estimation by BFGS

Convergence in    23 Iterations. Final criterion was  0.0000020 <=  0.0000100
 

Usable Observations                       200

Function Value                     -1005.9653
 

    Variable                        Coeff      Std Error      T-Stat      Signif

************************************************************************************

1.  B0                           3.9672328268 0.9399214875      4.22081  0.00002434

2.  B1                           2.1293789081 0.4359881354      4.88403  0.00000104

3.  LAMBDA                       0.3806290766 0.0461064950      8.25543  0.00000000

4.  SIGMA                        4.3753157531 0.7346527557      5.95562  0.00000000


 

An alternative approach to doing this with RATS is to use NLLS but to include the JACOBIAN option to provide the adjustment for the transformation. Note that JACOBIAN takes the Jacobian itself (3), not its log. (The variance gets concentrated out in doing this). The FRML input to NLLS (called here BOXCOXX) is a definition of u from (1):


 

nonlin b0 b1 lambda

linreg y

# constant x

compute b0=%beta(1)+%beta(2)-1,b1=%beta(2),lambda=1.0

frml boxcoxx  = %boxcox(y,lambda)-b0-b1*%boxcox(x,lambda)

nlls(frml=boxcoxx,jacobian=y^(lambda-1))


 

Nonlinear Least Squares - Estimation by Gauss-Newton

Convergence in    28 Iterations. Final criterion was  0.0000034 <=  0.0000100

 

Usable Observations                       200

Degrees of Freedom                        197

Standard Error of Estimate       4.4086467885

Sum of Squared Residuals         3828.9248017

Log Likelihood                     -1005.9653

Durbin-Watson Statistic                1.8744

 

    Variable                        Coeff      Std Error      T-Stat      Signif

************************************************************************************

1.  B0                           3.9673127680 0.1342121976     29.56000  0.00000000

2.  B1                           2.1294229014 0.0637617701     33.39655  0.00000000

3.  LAMBDA                       0.3806381995 0.0100254241     37.96729  0.00000000


 

Finally, it's possible to estimate this type of model using the fact that given \(\lambda\), the model is a linear regression, so can be estimated using a search over \(\lambda\). There is probably no real advantage for this over direct log likelihood maximization given the speed of modern computers. Note the the log likelihood that comes out of the LINREG on transformed data needs to be adjusted for the Jacobian term.
 

sstats / log(y)>>sumlogy

nonlin lambda

compute lambda=1.0

declare real loglikely

find(print) maximum loglikely

   set xbc = %boxcox(x,lambda)

   set ybc = %boxcox(y,lambda)

   linreg(noprint) ybc

   # constant xbc

   compute sigmasq=%rss/%nobs

   compute loglikely=(lambda-1)*sumlogy+%logconcdensity(sigmasq,%nobs)

end find
 

The output from this is:
 

FIND Optimization - Estimation by Simplex

Convergence in    20 Iterations. Final criterion was  0.0000061 <=  0.0000100

 

Function Value                     -1005.9653

 

    Variable                        Coeff

**********************************************

1.  LAMBDA                       0.3806396484


 

You would have to re-do the LINREG to get the other coefficients.

 

Full Program

 

all 200
*
*   Create data with a square root transformation.
*
set x = %rangamma(5.0)
set sqry = .5 + 3*sqrt(x) + %ran(4.0)
set y = sqry^2
*
* Using MAXIMIZE.
* The free coefficients b0, b1 and sigma will be transforms of the .5, 3 and 4
* used in generating the data since sqrt(y) and sqrt(x) are translated, scaled
* versions of the Box-Cox transformed data.
*
nonlin b0 b1 lambda sigma
frml rhsfrml  = b0+b1*%boxcox(x,lambda)
frml boxcox   = (lambda-1)*log(y) + %logdensity(sigma^2,%boxcox(y,lambda)-rhsfrml(t))
*
* This estimates a simple linear regression (special case when l=1), and sets the
* initial conditions based upon it.
*
linreg y
# constant x
compute sigma=sqrt(%seesq),b0=%beta(1)+%beta(2)-1
compute b1=%beta(2),lambda=1.0
maximize(method=bfgs) boxcox
*
* Using NLLS with JACOBIAN option
*
nonlin b0 b1 lambda
linreg y
# constant x
compute b0=%beta(1)+%beta(2)-1,b1=%beta(2),lambda=1.0
frml boxcoxx  = %boxcox(y,lambda)-b0-b1*%boxcox(x,lambda)
nlls(frml=boxcoxx,jacobian=y^(lambda-1))
*
* Using FIND
*
sstats / log(y)>>sumlogy
nonlin lambda
compute lambda=1.0
declare real loglikely
find(print) maximum loglikely
   set xbc = %boxcox(x,lambda)
   set ybc = %boxcox(y,lambda)
   linreg(noprint) ybc
   # constant xbc
   compute sigmasq=%rss/%nobs
   compute loglikely=(lambda-1)*sumlogy+%logconcdensity(sigmasq,%nobs)
end find
 


Copyright © 2024 Thomas A. Doan