Mixed/Ridge Estimation

Ridge regression (Hoerl and Kennard, 1970) and mixed estimation (Theil, 1971, pp. 347-352) are related procedures which are fairly easy to implement within RATS. These are Bayesian-like estimation techniques, in the sense that there are priors which will generate them as their posterior modes. A more modern approach (which requires more computing power than was available in the early 1970's) is Gibbs sampling.

A ridge estimator of \(\beta\) in \(y = X\beta + u\) is defined as

\begin{equation} \hat \beta = \left( {{\bf{X'}}{\kern 1pt} {\bf{X}} + k{\bf{I}}} \right)^{ - 1} {\bf{X'y}} \end{equation}

for some positive \(k\). Least squares is the special case of \(k=0\). Setting \(k>0\) has the effect of “shrinking” the estimate of the coefficients toward the origin, although individual coefficients may move away from zero.

In mixed estimation, non-sample information about \(\beta\) is put in the form

\begin{equation} {\bf{r}} = {\bf{R}}\beta + {\bf{v}}{\rm{ , Cov}}\left( {\bf{v}} \right) = {\bf{V}} \end{equation}

The mixed estimator (assuming \({\rm{Cov}}\left( {\bf{u}} \right) = \sigma ^2 {\bf{I}}\)) is

\begin{equation} \hat \beta = \left( {{\bf{X'}}{\kern 1pt} {\bf{X}} + \sigma ^2 {\bf{R'V}}^{ - {\bf{1}}} {\bf{R}}} \right)^{ - 1} \left( {{\bf{X'y}} + \sigma ^2 {\bf{R'V}}^{ - {\bf{1}}} {\bf{r}}} \right) \end{equation}

The ridge estimator is a special case of this where \(\bf{R}=\bf{I}\), \(\bf{r}=0\), and \(\bf{V}\) is scalar (multiple of the identity).

Computation Tools

Note that both techniques estimate \(\beta\) as

\begin{equation} \left( {{\bf{X'}}{\kern 1pt} {\bf{X}} + {\bf{Q}}} \right)^{ - 1} \left( {{\bf{X'y}} + {\bf{q}}} \right) \end{equation}

The tools for implementing such estimators are the CMOMENT and LINREG(CREATE) instructions:

•CMOMENT computes the matrices \({\bf{X'X}}\) and \({\bf{X'y}}\) using the actual data. In practice, these will actually be segments of the same matrix.

•LINREG(CREATE) takes as input already computed coefficients and covariance matrix and produces the rest of the regression output from that.

To simplify this process, on CMOMENT, you should list the variables as regressors first, followed by the dependent variable. That way, the matrix %CMOM, as computed by CMOMENT will have the form

\begin{equation} \left[ {\begin{array}{*{20}c} {{\bf{X'}}{\kern 1pt} {\bf{X}}} & {{\bf{y'}}{\kern 1pt} {\bf{X}}} \\ {{\bf{X'y}}} & {{\bf{y'y}}} \\ \end{array}} \right] \end{equation}

(actually just the lower triangle of this), so \({\bf{X'X}}\) is the top corner, and \({\bf{X'y}}\) is all but the last element of the last row. You can use %XSUBMAT to pull these out.

Ridge Regression

Ridge regression is especially easy because all we need to do is add a constant to the diagonal elements of \({\bf{X'X}}\).

cmoment

# constant x1 x2 x3 x4 y

* This does the least squares regression

linreg(cmoment) y

# constant x1 x2 x3 x4

compute xx=%xsubmat(%cmom,1,%nreg,1,%nreg)+K*%identity(%nreg)

compute xy=%xsubmat(%cmom,1,%nreg,%nreg+1,%nreg+1)

linreg(create,lastreg,covmat=inv(xx),coeffs=inv(xx)*xy)

Choosing an appropriate value for \(K\) is the real problem with ridge regression. See, for example, the discussion in Judge, et. all (1991).

Mixed Estimation

We need to construct the matrices \(\bf{R}\), \(\bf{r}\) and \(\bf{V}\) in some fashion. \(\bf{r}\) is often zero and \(\bf{V}\) is almost always diagonal, and is usually a scalar matrix (a constant times the identity), which simplify these considerably.

Since we do not know \(\sigma^2\), in general, we must use an estimate \(s^2\) of it. We do a preliminary LINREG to get the estimate \(s\) of \(\sigma\), then use matrix commands to compute \({\bf{X'}}{\kern 1pt} {\bf{X}} + s^2{\bf{R'V}}^{ - 1} {\bf{R}}\) and \({\bf{X'}}{\kern 1pt} {\bf{y}} + s^2 {\bf{R'V}}^{ - 1} {\bf{r}}\).

This is quite a bit more complicated than ridge estimation. You can use the procedure @MIXED to do this.

SHILLER.RPF provides an example of mixed estimation to apply the Shiller Smoothness Prior to estimation of a distributed lag. Mixed estimation is used internally to estimate VAR's with a prior.