|
Examples / LASSO.RPF |
LASSO.RPF uses PLS to estimate a distributed lag model using LASSO. This uses the same data set and basic model as in DISTRIBLAG.RPF and several other examples. Because this is a time series model using lags, the obvious choice for training and test samples are the early and late parts of the sample: in this case, the training sample ends at 2003:12, while the test sample picks up from there.
compute EndTraining=2003:12
compute StartTest=EndTraining+1
compute EndTest=%allocend()
This does least squares over the training sample and computes the out-of-sample mean square error:
linreg longrate * EndTraining
# shortrate{0 to 12} constant
prj fitted StartTest EndTest
sstats(mean) StartTest EndTest (longrate-fitted)^2>>linregmse
The least squares output shows the types of issues common to distributed lags on highly correlated data: most of the coefficients have high standard errors and there are many coefficients with negative signs. LASSO is one possible approach to producing a more workable estimate for this.
Unlike many other applications of LASSO, this has a set of regressors whose properties are better understood: they are all lags of the same interest rate and thus have very similar means and variances. This uses a cross product matrix which doesn't scale any of the variables (simply extracts the means). It then uses @PLSGrid to generate a grid of test values for \(\lambda\).
cmom(center) * EndTraining
# shortrate{0 to 12} longrate
*
* Compute a grid with 16 grid points per order of magnitude
*
@PLSGrid(penalty=l1,R2Guess=.7,PerOrder=16,$
yy=%cmom(%ncmom,%ncmom),xx=%cmom(1,1)) testlambdas
This computes the L1-penalized estimates for each test value and computes and saves the mean square error for each over the test sample.
dec vect testmse(%size(testlambdas))
ewise testmse(i)=0.0
dofor [real] lambda = testlambdas
pls(penalty=l2,cmom,lambda=lambda,noprint) longrate
# shortrate{0 to 12}
prj fitted StartTest EndTest
sstats(mean) StartTest EndTest (longrate-fitted)^2>>testmse(%doforpass)
end dofor lambda
This uses the %MININDEX function to locate the position of the minimum mean squared error and redoes the estimation using it, this time including the output. Note that this does not provide standard errors—it generates point estimates only. Note also that the effect of the L1 penalty is to zero out most of the coefficients. This is then following by a SCATTER graph of the mean square errors as a function of the \(\lambda\) values (done using a log 10 scale on the horizontal axis). Note that the MSE starts out almost flat at the low end—when the penalty is low, you get, in effect, least squares. As you can see from this, least squares has a much higher MSE than the optimal value. This is due to the high degree of collinearity in the data which produces least squares estimates which are very imprecise and give rise to poor out-of-sample behavior. Note also that the MSE end up absolutely dead flat at the high end. Given the nature of the L1 penalty (the sum of squares has a zero derivative coming away from zero, which the penalty does not) at a high enough setting for \(\lambda\), the coefficients simply get zeroed out.
compute bestvalue=%minindex(testmse)
pls(penalty=l2,cmom,lambda=testlambdas(bestvalue),iters=1000,print) longrate
# shortrate{0 to 12}
scatter(x=testlambdas,y=testmse,hlog=10,style=lines,footer="MSE for L2 Penalty")
The next part estimates the model including the CONSTANT to demonstrate use of the PW option to give zero weight to the penalty on the CONSTANT.
pls(penalty=l1,lambda=5000.0,iters=1000,pw=%ones(1,13)~~0.0,print) longrate * EndTraining
# shortrate{0 to 12} constant
It finally does a parallel analysis using the L2 penalty (ridge regression), this time using a full correlation matrix that includes both the independent and dependent variable. Note that, unlike the earlier analysis, this does not treat the coefficients on the lags identically as each will have its own variance calculation used in standardization. Standardizing the dependent variable eliminates the dependence of the \(\lambda\) values on the number of observations. As above, the estimation at the optimal value is re-done with output. In constrast to the L1 penalty, the effect of the L2 penalty here is to force the lag coefficients to be very similar. With both approaches, the sum of the coefficients is roughly .4, but with a very different pattern. Despite that, the MSE for the two is quite similar.
The graph of the MSE is quite a bit smoother because, while the L2 penalty pushes coefficients towards zero, it does not strongly favor zero over a slightly non-zero value.
cmom(corr) * EndTraining
# shortrate{0 to 12} longrate
@PLSGrid(penalty=l2,R2Guess=.7,PerOrder=16,yy=1.0,xx=1.0) testlambdas
*
dec vect testmse(%size(testlambdas))
ewise testmse(i)=0.0
dofor [real] lambda = testlambdas
pls(penalty=l2,cmom,lambda=lambda,noprint) longrate
# shortrate{0 to 12}
prj fitted StartTest EndTest
sstats(mean) StartTest EndTest (longrate-fitted)^2>>testmse(%doforpass)
end dofor lambda
compute bestvalue=%minindex(testmse)
pls(penalty=l2,cmom,lambda=testlambdas(bestvalue),iters=1000,print) longrate
# shortrate{0 to 12}
scatter(x=testlambdas,y=testmse,hlog=10,style=lines,footer="MSE for L2 Penalty")
Full Program
open data haversample.rat
calendar(m) 1947
data(format=rats) 1947:1 2007:4 fltg ftb3
set shortrate = ftb3
set longrate = fltg
*
compute EndTraining=2003:12
compute StartTest=EndTraining+1
compute EndTest=%allocend()
*
* Estimate by least squares
*
linreg longrate * EndTraining
# shortrate{0 to 12} constant
prj fitted StartTest EndTest
sstats(mean) StartTest EndTest (longrate-fitted)^2>>linregmse
*
* Compute cross product matrix with means subtracted. This is not
* standardized since all the explanatory variables have similar
* properties (as lags of a single variable).
*
cmom(center) * EndTraining
# shortrate{0 to 12} longrate
*
* Compute a grid with 16 grid points per order of magnitude
*
@PLSGrid(penalty=l1,R2Guess=.7,PerOrder=16,$
yy=%cmom(%ncmom,%ncmom),xx=%cmom(1,1)) testlambdas
dec vect testmse(%size(testlambdas))
ewise testmse(i)=0.0
dofor [real] lambda = testlambdas
pls(penalty=l1,cmom,lambda=lambda,noprint) longrate
# shortrate{0 to 12}
prj fitted StartTest EndTest
sstats(mean) StartTest EndTest (longrate-fitted)^2>>testmse(%doforpass)
end dofor lambda
*
* Redo with a print option on for the best MSE value for lambda
*
compute bestvalue=%minindex(testmse)
pls(penalty=l1,cmom,lambda=testlambdas(bestvalue),print) longrate
# shortrate{0 to 12}
*
* Draw scatter graph (with log scale on horizontal axis)
*
scatter(x=testlambdas,y=testmse,hlog=10,style=lines,footer="MSE for L1 Penalty")
*
* Estimate model with the CONSTANT using the PW option to give zero
* weight to penalty on CONSTANT.
*
pls(penalty=l1,lambda=5000.0,iters=1000,pw=%ones(1,13)~~0.0,print) longrate * EndTraining
# shortrate{0 to 12} constant
*
* Using a full matrix of correlations and L2 penalty
*
cmom(corr) * EndTraining
# shortrate{0 to 12} longrate
@PLSGrid(penalty=l2,R2Guess=.7,PerOrder=16,yy=1.0,xx=1.0) testlambdas
*
dec vect testmse(%size(testlambdas))
ewise testmse(i)=0.0
dofor [real] lambda = testlambdas
pls(penalty=l2,cmom,lambda=lambda,noprint) longrate
# shortrate{0 to 12}
prj fitted StartTest EndTest
sstats(mean) StartTest EndTest (longrate-fitted)^2>>testmse(%doforpass)
end dofor lambda
compute bestvalue=%minindex(testmse)
pls(penalty=l2,cmom,lambda=testlambdas(bestvalue),iters=1000,print) longrate
# shortrate{0 to 12}
scatter(x=testlambdas,y=testmse,hlog=10,style=lines,footer="MSE for L2 Penalty")
Output
Linear Regression - Estimation by Least Squares
Dependent Variable LONGRATE
Monthly Data From 1948:01 To 2003:12
Usable Observations 672
Degrees of Freedom 658
Centered R^2 0.8611418
R-Bar^2 0.8583984
Uncentered R^2 0.9776658
Mean of Dependent Variable 6.1569791667
Std Error of Dependent Variable 2.6975482514
Standard Error of Estimate 1.0150869130
Sum of Squared Residuals 678.00414808
Regression F(13,658) 313.8960
Significance Level of F 0.0000000
Log Likelihood -956.5154
Durbin-Watson Statistic 0.0657
Variable Coeff Std Error T-Stat Signif
************************************************************************************
1. SHORTRATE 0.470494318 0.096499075 4.87564 0.00000136
2. SHORTRATE{1} -0.121580121 0.160554534 -0.75725 0.44917043
3. SHORTRATE{2} 0.072770909 0.167171777 0.43531 0.66348294
4. SHORTRATE{3} -0.093059146 0.167179440 -0.55664 0.57796112
5. SHORTRATE{4} 0.010270826 0.167522790 0.06131 0.95113093
6. SHORTRATE{5} 0.059922996 0.168249197 0.35616 0.72183775
7. SHORTRATE{6} 0.099440987 0.171246031 0.58069 0.56164772
8. SHORTRATE{7} -0.025524964 0.168205131 -0.15175 0.87943141
9. SHORTRATE{8} 0.060812809 0.167477421 0.36311 0.71663896
10. SHORTRATE{9} -0.074227601 0.167136520 -0.44411 0.65710647
11. SHORTRATE{10} 0.106124729 0.167135105 0.63496 0.52567280
12. SHORTRATE{11} -0.112077516 0.160524257 -0.69820 0.48530082
13. SHORTRATE{12} 0.434086597 0.096452378 4.50053 0.00000802
14. Constant 1.802719405 0.078631765 22.92610 0.00000000
Penalized Least Squares - Estimation by L1/LASSO
L1(LASSO) Penalty with Lambda=4520.498
Convergence in 268 Iterations. Final criterion was 0.0000100 <= 0.0000100
Dependent Variable LONGRATE
Monthly Data From 1948:01 To 2003:12
Usable Observations 672
Centered R^2 0.6647438
R-Bar^2 0.6581202
Uncentered R^2 0.9460767
Mean of Dependent Variable 6.1569791667
Std Error of Dependent Variable 2.6975482514
Standard Error of Estimate 1.5772695744
Sum of Squared Residuals 1636.9587862
Log Likelihood -1252.6799
Durbin-Watson Statistic 0.0176
Variable Coeff
**********************************************
1. SHORTRATE 0.1822072566
2. SHORTRATE{1} 0.0000000000
3. SHORTRATE{2} 0.0000000000
4. SHORTRATE{3} 0.0000000000
5. SHORTRATE{4} 0.0000000000
6. SHORTRATE{5} 0.0453268098
7. SHORTRATE{6} 0.0000000000
8. SHORTRATE{7} 0.0000000000
9. SHORTRATE{8} 0.0358632792
10. SHORTRATE{9} 0.0000000000
11. SHORTRATE{10} 0.0272298019
12. SHORTRATE{11} 0.0000000000
13. SHORTRATE{12} 0.1729951187
14. Constant 3.8822058399
Penalized Least Squares - Estimation by L1/LASSO
L1(LASSO) Penalty with Lambda=5000
Convergence in 905 Iterations. Final criterion was 0.0000100 <= 0.0000100
Dependent Variable LONGRATE
Monthly Data From 1948:01 To 2003:12
Usable Observations 672
Centered R^2 0.6209959
R-Bar^2 0.6135080
Uncentered R^2 0.9390402
Mean of Dependent Variable 6.1569791667
Std Error of Dependent Variable 2.6975482514
Standard Error of Estimate 1.6770248266
Sum of Squared Residuals 1850.5672731
Log Likelihood -1293.8911
Durbin-Watson Statistic 0.0155
Variable Coeff
**********************************************
1. SHORTRATE 0.1620165600
2. SHORTRATE{1} 0.0000000000
3. SHORTRATE{2} 0.0000000000
4. SHORTRATE{3} 0.0000000000
5. SHORTRATE{4} 0.0000000000
6. SHORTRATE{5} 0.0406855688
7. SHORTRATE{6} 0.0000000000
8. SHORTRATE{7} 0.0000000000
9. SHORTRATE{8} 0.0348204921
10. SHORTRATE{9} 0.0000000000
11. SHORTRATE{10} 0.0288426017
12. SHORTRATE{11} 0.0000000000
13. SHORTRATE{12} 0.1523836968
14. Constant 4.1023776764
Penalized Least Squares - Estimation by L2/Ridge
L2(Ridge) Penalty with Lambda=11.73608
Convergence in 7 Iterations. Final criterion was 0.0000079 <= 0.0000100
Dependent Variable LONGRATE
Monthly Data From 1948:01 To 2003:12
Usable Observations 672
Centered R^2 0.6433453
R-Bar^2 0.6362989
Uncentered R^2 0.9426349
Mean of Dependent Variable 6.1569791667
Std Error of Dependent Variable 2.6975482514
Standard Error of Estimate 1.6268274832
Sum of Squared Residuals 1741.4415203
Log Likelihood -1273.4693
Durbin-Watson Statistic 0.0198
Variable Coeff
**********************************************
1. SHORTRATE 0.0348160270
2. SHORTRATE{1} 0.0344526172
3. SHORTRATE{2} 0.0341079737
4. SHORTRATE{3} 0.0339050361
5. SHORTRATE{4} 0.0338596695
6. SHORTRATE{5} 0.0338367992
7. SHORTRATE{6} 0.0337428812
8. SHORTRATE{7} 0.0338197201
9. SHORTRATE{8} 0.0340030602
10. SHORTRATE{9} 0.0341397595
11. SHORTRATE{10} 0.0343071613
12. SHORTRATE{11} 0.0344934151
13. SHORTRATE{12} 0.0347172948
14. Constant 3.9770116414
Graphs
.png)
.png)
Copyright © 2025 Thomas A. Doan