PLS Instruction

PLS( options ) depvar start end resids

# list of explanatory variables (in Regression Format)

PLS estimates a linear regression using penalized least squares (LASSO or ridge regression). This estimates subject to a single (assumed known) penalty factor, so, in practice, it will be used in some type of loop over possible values. An example is provided in LASSO.RPF.

Parameters

depvar	dependent variable
start, end	Range to use in estimation. If you have not set a SMPL, this defaults to the largest common range for all the variables involved. If you use the INSTR option, the instruments are included in determining the default.
resids	series for residuals (optional)

Options

[PRINT]/NOPRINT

This controls the printing of the regression output. Note that PLS does not compute standard errors or a covariance matrix; just point estimates.

TITLE="title for output" [Depends upon estimation method]

This option allows you to supply your own title to label the resulting output.

PENALIZE=[L1]/L2

LAMBDA=tuning parameter [0.0]

The PENALIZE option chooses between the L1 penalty (LASSO estimation) and L2 (Ridge Regression). LAMBDA is the tuning parameter which controls how much weight is put on the penalty function vs the sum of squared residuals. Note that PLS only does the calculation for a single value.

SMPL=Standard SMPL option [unused].

Omits from the estimation any observations where SMPL series or expression is zero.

SPREAD=Standard SPREAD option [unused]

WEIGHT=Standard WEIGHT option[unused]

Use SPREAD for weighted least squares and WEIGHT to provide different weights for each observation.

SHUFFLE=SERIES[INTEGER] with entry remapping[unused]

EQUATION=Equation to estimate [unused]

Use the EQUATION option to estimate a previously defined equation. If you use it, omit the supplementary card.

DEFINE=equation to define [unused].

FRML=formula to define [unused]

These define an equation and formula, respectively, using the results of the estimation. You can use the equation/formula for forecasting or other purposes.

CMOM/[NOCMOM]

Use the CMOM option together with the CMOMENT instruction (executed prior to the PLS). PLS takes the required cross products of the variables from the array created by CMOMENT. By computing the cross products just once, you can reduce the computations involved in repetitive regressions if you need to search over the LAMBDA parameter.

ENTRIES=number of entries to use from supplementary card [all]

This allows you to control how many of the elements on the supplementary card are processed. This can be useful in repetitive-analysis tasks, where you may want to add additional entries on each trip through a loop, for example.

ITERATIONS=iteration limit [1000]

CVCRIT=convergence limit [.00001]

ITERATIONS sets the maximum number of iterations, CVCRIT the convergence criterion. PLS estimates using Cyclic Coordinate Descent which works for both L1 and L2 penalties. (L1 has a non-differentiable objective function). It does quite a bit less per iteration than variational methods like BFGS which is why the default iteration count is 1000.

INITIAL=VECTOR of initial guess values [unused]

In particularly large problems, you can use this to start the estimation off with the converged results from a similar setting for LAMBDA to speed up the calculation.

PW=VECTOR of parameter weights [equal weights]

SCALES=VECTOR of explanatory variable scales [unused]

These are similar options for adjusting the way the penalty is applied to different coefficients. Either option needs to have a dimension equal to the number of variables in the model. PW provides a direct multiplier for the penalty applied to each coefficient so can be zero if you have a parameter that you don't want to be penalized. SCALES provides a value which deflates the corresponding explanatory variable (thus multiplying up the coefficient).

Variables Defined

%BETA	coefficient VECTOR
%FUNCVAL	final value of the function being optimized (REAL)
%MEAN	mean of dependent variable (REAL)
%NFREE	number of free parameters (INTEGER)
%NOBS	number of observations (INTEGER)
%NREG	number of regressors (INTEGER)
%RESIDS	SERIES containing the residuals
%VARIANCE	variance of dependent variable (REAL)

Examples

This is from Stock and Watson, 4th edition (SW4CHAP14LS.RPF). It does an L1 penalty estimation for a regression in a "big data" set with well over 1000 regressors (which is why the ITERS option is so large).

pls(noprint,penalize=l1,lambda=lambda,cmom,scales=scale,iters=10000) testscore

# x cubes interact

This is from LASSO.RPF. It does an L2 penalty estimation, showing the output.

pls(penalty=l2,cmom,lambda=testlambdas(bestvalue),iters=1000,print) longrate

# shortrate{0 to 12}