LASSO.RPF

LASSO.RPF uses PLS to estimate a distributed lag model using LASSO. This uses the same data set and basic model as in DISTRIBLAG.RPF and several other examples. Because this is a time series model using lags, the obvious choice for training and test samples are the early and late parts of the sample: in this case, the training sample ends at 2003:12, while the test sample picks up from there.

compute EndTraining=2003:12

compute StartTest=EndTraining+1

compute EndTest=%allocend()

This does least squares over the training sample and computes the out-of-sample mean square error:

linreg longrate * EndTraining

# shortrate{0 to 12} constant

prj fitted StartTest EndTest

sstats(mean) StartTest EndTest (longrate-fitted)^2>>linregmse

The least squares output shows the types of issues common to distributed lags on highly correlated data: most of the coefficients have high standard errors and there are many coefficients with negative signs. LASSO is one possible approach to producing a more workable estimate for this.

Unlike many other applications of LASSO, this has a set of regressors whose properties are better understood: they are all lags of the same interest rate and thus have very similar means and variances. This uses a cross product matrix which doesn't scale any of the variables (simply extracts the means). It then uses @PLSGrid to generate a grid of test values for $\lambda$.

cmom(center) * EndTraining

# shortrate{0 to 12} longrate

* Compute a grid with 16 grid points per order of magnitude

@PLSGrid(penalty=l1,R2Guess=.7,PerOrder=16,$

yy=%cmom(%ncmom,%ncmom),xx=%cmom(1,1)) testlambdas

This computes the L1-penalized estimates for each test value and computes and saves the mean square error for each over the test sample.

dec vect testmse(%size(testlambdas))

ewise testmse(i)=0.0

dofor [real] lambda = testlambdas

pls(penalty=l2,cmom,lambda=lambda,noprint) longrate

# shortrate{0 to 12}

prj fitted StartTest EndTest

sstats(mean) StartTest EndTest (longrate-fitted)^2>>testmse(%doforpass)

end dofor lambda

This uses the %MININDEX function to locate the position of the minimum mean squared error and redoes the estimation using it, this time including the output. Note that this does not provide standard errors—it generates point estimates only. Note also that the effect of the L1 penalty is to zero out most of the coefficients. This is then following by a SCATTER graph of the mean square errors as a function of the $\lambda$ values (done using a log 10 scale on the horizontal axis). Note that the MSE starts out almost flat at the low end—when the penalty is low, you get, in effect, least squares. As you can see from this, least squares has a much higher MSE than the optimal value. This is due to the high degree of collinearity in the data which produces least squares estimates which are very imprecise and give rise to poor out-of-sample behavior. Note also that the MSE end up absolutely dead flat at the high end. Given the nature of the L1 penalty (the sum of squares has a zero derivative coming away from zero, which the penalty does not) at a high enough setting for $\lambda$, the coefficients simply get zeroed out.

compute bestvalue=%minindex(testmse)

pls(penalty=l2,cmom,lambda=testlambdas(bestvalue),iters=1000,print) longrate

# shortrate{0 to 12}

scatter(x=testlambdas,y=testmse,hlog=10,style=lines,footer="MSE for L2 Penalty")

The next part estimates the model including the CONSTANT to demonstrate use of the PW option to give zero weight to the penalty on the CONSTANT.

pls(penalty=l1,lambda=5000.0,iters=1000,pw=%ones(1,13)~~0.0,print) longrate * EndTraining

# shortrate{0 to 12} constant

It finally does a parallel analysis using the L2 penalty (ridge regression), this time using a full correlation matrix that includes both the independent and dependent variable. Note that, unlike the earlier analysis, this does not treat the coefficients on the lags identically as each will have its own variance calculation used in standardization. Standardizing the dependent variable eliminates the dependence of the $\lambda$ values on the number of observations. As above, the estimation at the optimal value is re-done with output. In constrast to the L1 penalty, the effect of the L2 penalty here is to force the lag coefficients to be very similar. With both approaches, the sum of the coefficients is roughly .4, but with a very different pattern. Despite that, the MSE for the two is quite similar.

The graph of the MSE is quite a bit smoother because, while the L2 penalty pushes coefficients towards zero, it does not strongly favor zero over a slightly non-zero value.

cmom(corr) * EndTraining

# shortrate{0 to 12} longrate

@PLSGrid(penalty=l2,R2Guess=.7,PerOrder=16,yy=1.0,xx=1.0) testlambdas

dec vect testmse(%size(testlambdas))

ewise testmse(i)=0.0

dofor [real] lambda = testlambdas

pls(penalty=l2,cmom,lambda=lambda,noprint) longrate

# shortrate{0 to 12}

prj fitted StartTest EndTest

sstats(mean) StartTest EndTest (longrate-fitted)^2>>testmse(%doforpass)

end dofor lambda

compute bestvalue=%minindex(testmse)

pls(penalty=l2,cmom,lambda=testlambdas(bestvalue),iters=1000,print) longrate

# shortrate{0 to 12}

scatter(x=testlambdas,y=testmse,hlog=10,style=lines,footer="MSE for L2 Penalty")

Full Program

open data haversample.rat

calendar(m) 1947

data(format=rats) 1947:1 2007:4 fltg ftb3

set shortrate = ftb3

set longrate = fltg

compute EndTraining=2003:12

compute StartTest=EndTraining+1

compute EndTest=%allocend()

* Estimate by least squares

linreg longrate * EndTraining

# shortrate{0 to 12} constant

prj fitted StartTest EndTest

sstats(mean) StartTest EndTest (longrate-fitted)^2>>linregmse

* Compute cross product matrix with means subtracted. This is not

* standardized since all the explanatory variables have similar

* properties (as lags of a single variable).

cmom(center) * EndTraining

# shortrate{0 to 12} longrate

* Compute a grid with 16 grid points per order of magnitude

@PLSGrid(penalty=l1,R2Guess=.7,PerOrder=16,$

yy=%cmom(%ncmom,%ncmom),xx=%cmom(1,1)) testlambdas

dec vect testmse(%size(testlambdas))

ewise testmse(i)=0.0

dofor [real] lambda = testlambdas

pls(penalty=l1,cmom,lambda=lambda,noprint) longrate

# shortrate{0 to 12}

prj fitted StartTest EndTest

sstats(mean) StartTest EndTest (longrate-fitted)^2>>testmse(%doforpass)

end dofor lambda

* Redo with a print option on for the best MSE value for lambda

compute bestvalue=%minindex(testmse)

pls(penalty=l1,cmom,lambda=testlambdas(bestvalue),print) longrate

# shortrate{0 to 12}

* Draw scatter graph (with log scale on horizontal axis)

scatter(x=testlambdas,y=testmse,hlog=10,style=lines,footer="MSE for L1 Penalty")

* Estimate model with the CONSTANT using the PW option to give zero

* weight to penalty on CONSTANT.

pls(penalty=l1,lambda=5000.0,iters=1000,pw=%ones(1,13)~~0.0,print) longrate * EndTraining

# shortrate{0 to 12} constant

* Using a full matrix of correlations and L2 penalty

cmom(corr) * EndTraining

# shortrate{0 to 12} longrate

@PLSGrid(penalty=l2,R2Guess=.7,PerOrder=16,yy=1.0,xx=1.0) testlambdas

dec vect testmse(%size(testlambdas))

ewise testmse(i)=0.0

dofor [real] lambda = testlambdas

pls(penalty=l2,cmom,lambda=lambda,noprint) longrate

# shortrate{0 to 12}

prj fitted StartTest EndTest

sstats(mean) StartTest EndTest (longrate-fitted)^2>>testmse(%doforpass)

end dofor lambda

compute bestvalue=%minindex(testmse)

pls(penalty=l2,cmom,lambda=testlambdas(bestvalue),iters=1000,print) longrate

# shortrate{0 to 12}

scatter(x=testlambdas,y=testmse,hlog=10,style=lines,footer="MSE for L2 Penalty")

Output

Linear Regression - Estimation by Least Squares

Dependent Variable LONGRATE

Monthly Data From 1948:01 To 2003:12

Usable Observations 672

Degrees of Freedom 658

Centered R^2 0.8611418

R-Bar^2 0.8583984

Uncentered R^2 0.9776658

Mean of Dependent Variable 6.1569791667

Std Error of Dependent Variable 2.6975482514

Standard Error of Estimate 1.0150869130

Sum of Squared Residuals 678.00414808

Regression F(13,658) 313.8960

Significance Level of F 0.0000000

Log Likelihood -956.5154

Durbin-Watson Statistic 0.0657

Variable Coeff Std Error T-Stat Signif

************************************************************************************

1. SHORTRATE 0.470494318 0.096499075 4.87564 0.00000136

2. SHORTRATE{1} -0.121580121 0.160554534 -0.75725 0.44917043

3. SHORTRATE{2} 0.072770909 0.167171777 0.43531 0.66348294

4. SHORTRATE{3} -0.093059146 0.167179440 -0.55664 0.57796112

5. SHORTRATE{4} 0.010270826 0.167522790 0.06131 0.95113093

6. SHORTRATE{5} 0.059922996 0.168249197 0.35616 0.72183775

7. SHORTRATE{6} 0.099440987 0.171246031 0.58069 0.56164772

8. SHORTRATE{7} -0.025524964 0.168205131 -0.15175 0.87943141

9. SHORTRATE{8} 0.060812809 0.167477421 0.36311 0.71663896

10. SHORTRATE{9} -0.074227601 0.167136520 -0.44411 0.65710647

11. SHORTRATE{10} 0.106124729 0.167135105 0.63496 0.52567280

12. SHORTRATE{11} -0.112077516 0.160524257 -0.69820 0.48530082

13. SHORTRATE{12} 0.434086597 0.096452378 4.50053 0.00000802

14. Constant 1.802719405 0.078631765 22.92610 0.00000000

Penalized Least Squares - Estimation by L1/LASSO

L1(LASSO) Penalty with Lambda=4520.498

Convergence in 268 Iterations. Final criterion was 0.0000100 <= 0.0000100

Dependent Variable LONGRATE

Monthly Data From 1948:01 To 2003:12

Usable Observations 672

Centered R^2 0.6647438

R-Bar^2 0.6581202

Uncentered R^2 0.9460767

Mean of Dependent Variable 6.1569791667

Std Error of Dependent Variable 2.6975482514

Standard Error of Estimate 1.5772695744

Sum of Squared Residuals 1636.9587862

Log Likelihood -1252.6799

Durbin-Watson Statistic 0.0176

Variable Coeff

**********************************************

1. SHORTRATE 0.1822072566

2. SHORTRATE{1} 0.0000000000

3. SHORTRATE{2} 0.0000000000

4. SHORTRATE{3} 0.0000000000

5. SHORTRATE{4} 0.0000000000

6. SHORTRATE{5} 0.0453268098

7. SHORTRATE{6} 0.0000000000

8. SHORTRATE{7} 0.0000000000

9. SHORTRATE{8} 0.0358632792

10. SHORTRATE{9} 0.0000000000

11. SHORTRATE{10} 0.0272298019

12. SHORTRATE{11} 0.0000000000

13. SHORTRATE{12} 0.1729951187

14. Constant 3.8822058399

Penalized Least Squares - Estimation by L1/LASSO

L1(LASSO) Penalty with Lambda=5000

Convergence in 905 Iterations. Final criterion was 0.0000100 <= 0.0000100

Dependent Variable LONGRATE

Monthly Data From 1948:01 To 2003:12

Usable Observations 672

Centered R^2 0.6209959

R-Bar^2 0.6135080

Uncentered R^2 0.9390402

Mean of Dependent Variable 6.1569791667

Std Error of Dependent Variable 2.6975482514

Standard Error of Estimate 1.6770248266

Sum of Squared Residuals 1850.5672731

Log Likelihood -1293.8911

Durbin-Watson Statistic 0.0155

Variable Coeff

**********************************************

1. SHORTRATE 0.1620165600

2. SHORTRATE{1} 0.0000000000

3. SHORTRATE{2} 0.0000000000

4. SHORTRATE{3} 0.0000000000

5. SHORTRATE{4} 0.0000000000

6. SHORTRATE{5} 0.0406855688

7. SHORTRATE{6} 0.0000000000

8. SHORTRATE{7} 0.0000000000

9. SHORTRATE{8} 0.0348204921

10. SHORTRATE{9} 0.0000000000

11. SHORTRATE{10} 0.0288426017

12. SHORTRATE{11} 0.0000000000

13. SHORTRATE{12} 0.1523836968

14. Constant 4.1023776764

Penalized Least Squares - Estimation by L2/Ridge

L2(Ridge) Penalty with Lambda=11.73608

Convergence in 7 Iterations. Final criterion was 0.0000079 <= 0.0000100

Dependent Variable LONGRATE

Monthly Data From 1948:01 To 2003:12

Usable Observations 672

Centered R^2 0.6433453

R-Bar^2 0.6362989

Uncentered R^2 0.9426349

Mean of Dependent Variable 6.1569791667

Std Error of Dependent Variable 2.6975482514

Standard Error of Estimate 1.6268274832

Sum of Squared Residuals 1741.4415203

Log Likelihood -1273.4693

Durbin-Watson Statistic 0.0198

Variable Coeff

**********************************************

1. SHORTRATE 0.0348160270

2. SHORTRATE{1} 0.0344526172

3. SHORTRATE{2} 0.0341079737

4. SHORTRATE{3} 0.0339050361

5. SHORTRATE{4} 0.0338596695

6. SHORTRATE{5} 0.0338367992

7. SHORTRATE{6} 0.0337428812

8. SHORTRATE{7} 0.0338197201

9. SHORTRATE{8} 0.0340030602

10. SHORTRATE{9} 0.0341397595

11. SHORTRATE{10} 0.0343071613

12. SHORTRATE{11} 0.0344934151

13. SHORTRATE{12} 0.0347172948

14. Constant 3.9770116414

Graphs