Examples / VARLAG.RPF |
VARLAG.RPF is an example of lag length selection in a VAR.
This shows several different ways to approach the choice of lag length in a VAR:
1.It does a formal test of one lag length vs another longer one.
2.It uses the @VARLagSelect procedure to show an automatic choice mechanism.
3.It shows the inner workings of the @VARLagSelect procedure in case you need something more flexible than the procedure itself.
Note that all of these require that the various estimations use the same range of data (typically the range permitted by the longest lag length considered). If you don't do that, the results will be invalid. (Once you have chosen the lag length, you can use a different range for further analysis).
This uses a quarterly data set with a five variable system comprising real GDP (GDPH), unemployment rate (LR), gross private fixed investment (IH), implicit price deflator (DGNP) and M2 (FM2). All source variables are seasonally adjusted. We transform all but the unemployment rate to logs.
open data haversample.rat
calendar(q) 1959
data(format=rats) 1959:1 2006:4 dgnp fm2 gdph ih lr
*
set loggdp = log(gdph)
set logm2 = log(fm2)
set logp = log(dgnp)
set logi = log(ih)
set unemp = lr
We will be allowing up to eight lags, so the earliest common starting point is 1961:1, which we put into the ESTART variable.
compute estart=1961:1
The formal test is of four lags vs the alternative of eight.
This sets up and estimates the VAR with eight lags, saving the residuals into the VECT[SERIES] RESIDS8, the log determinant into LOGDETU, the number of regressors per equation to MCORR (the \(c\) to be used in the test statistic) and the total number of regressors in the system into %NREGSYSTEM.
system(model=usamodel)
variables loggdp unemp logi logp logm2
lags 1 to 8
det constant
end(system)
estimate(noprint,resids=resids8) estart *
compute logdetu=%logdet
compute mcorr=%nreg,ntotal=%nregsystem
and this sets up and estimates the same with four lags, residuals into RESIDS4, log determinant into LOGDETR and the number of restrictions (which is the difference between NTOTAL and the %NREGSYSTEM in the shorter lag model) into NRESTR. Note that you never actually have to figure out how many restrictions there are, since you can use the %NREGSYSTEM variables to count coefficients for you.
system(model=usamodel)
variables loggdp unemp logi logp logm2
lags 1 to 4
det constant
end(system)
estimate(noprint,resids=resids4) 1961:1 *
compute logdetr=%logdet
compute nrestr=ntotal-%nregsystem
This first calculates the test statistic from the formula (1) and uses CDF to display its significance as a \(\chi ^2\).
cdf(title="Test of 8 vs 4 Lags") chisqr $
(%nobs-mcorr)*(logdetr-logdetu) nrestr
and this does the same calculation using RATIO:
ratio(degrees=nrestr,mcorr=mcorr,title="Test of 8 vs 4 Lags")
# resids8
# resids4
The results should (and do) match to any reasonable number of significant digits.
@VARLagSelect can be used to automate selection of the lag length in a standard VAR (CONSTANT as the only deterministic variable, same lags for all variables in all equations). Note that different methods can (and often do) produce different results, so this needs to be used with some caution. The procedure requires you to pick a maximum lag length. Be careful not to make this too high—long lags both cost degrees of freedom and also reduce the number of usable data points. In this case, we use the same 8 as was used in the formal test. It would rarely make sense to use more than two years' worth of lags, and with a shorter data set, not even that.
@VARLagSelect gives a choice of four different methods for lag selection: three are information criteria; the other is general-to-specific lag pruning. However, the last is rarely used for VARs.
This does the lag selection using (corrected) AIC (Akaike Information Criterion). This creates a table which shows the chosen criterion for the different lag lengths and stars the one which gives the minimum value. (The information criterion is minimized for the "best" selection).
@varlagselect(lags=8,crit=aic)
# loggdp unemp logi logp logm2
Note that the lag length test and the AIC come to the completely opposite conclusions. The AIC chooses two and rather strongly prefers four lags over eight. This is because of the large number of parameters covered by a single lag in a model this size. The information criteria will usually choose a relatively small model, while “general-to-specific”, a common choice for univariate models, often chooses a rather large model (often the longest lag allowed).
The program finally shows the workings of the @VARLagSelect procedure, which allows you to modify it for selecting lags in a somewhat different model. It shows columns for the AIC and SBC, and also sequential likelihood ratio tests. This uses a standardized versions of the information criteria (that is, divided by T) and a version of AIC with small-sample correction.
This makes heavy use of the REPORT instruction to format the output. It tags the minimizing values in both the SBC and AIC columns (which both pick 2). The variables LASTLL and LL are used for doing the sequential tests. (LASTLL is the log likelihood from the previous model, LL for the current one).
report(action=define,$
hlabel=||"Lags","AIC","SBC","LR Test","P-Value"||)
dec real lastll
do lags=1,8
system(model=usamodel)
variables loggdp unemp logi logp logm2
lags 1 to lags
det constant
end(system)
estimate(noprint) 1961:1 *
compute ll =%logl
compute sbc=-2.0*ll/%nobs+%nregsystem*log(%nobs)/%nobs
compute aic=-2.0*ll/%nobs+$
%nregsystem*2.0*%nvar/(%nvar*%nobs-%nregsystem-1)
report(row=new,atcol=1,align=decimal) lags aic sbc
if lags>1
report(row=current,atcol=4) 2*(ll-lastll) $
%chisqr(2*(ll-lastll),%nvar*%nvar)
compute lastll=ll
end do lags
report(action=format,atcol=2,tocol=2,special=onestar,$
tag=min,align=decimal)
report(action=format,atcol=3,tocol=3,special=onestar,$
tag=min,align=decimal)
report(action=format,atcol=2,tocol=3,width=8)
report(action=format,atcol=4,tocol=5,picture="*.####")
report(action=show)
Full Program
open data haversample.rat
calendar(q) 1959
data(format=rats) 1959:1 2006:4 dgnp fm2 gdph ih lr
*
set loggdp = log(gdph)
set logm2 = log(fm2)
set logp = log(dgnp)
set logi = log(ih)
set unemp = lr
*
system(model=usamodel)
variables loggdp unemp logi logp logm2
lags 1 to 8
det constant
end(system)
estimate(noprint,resids=resids8) 1961:1 *
compute logdetu=%logdet
compute mcorr=%nreg,ntotal=%nregsystem
*
system(model=usamodel)
variables loggdp unemp logi logp logm2
lags 1 to 4
det constant
end(system)
estimate(noprint,resids=resids4) 1961:1 *
compute logdetr=%logdet
*
* Using RATIO
*
compute nrestr=ntotal-%nregsystem
ratio(degrees=nrestr,mcorr=mcorr,title="Test of 8 vs 4 Lags")
# resids8
# resids4
*
* Using calculated statistic
*
cdf(title="Test of 8 vs 4 Lags") chisqr $
(%nobs-mcorr)*(logdetr-logdetu) nrestr
*
* Using @VARLagSelect procedure
*
@varlagselect(lags=8,crit=aic)
# loggdp unemp logi logp logm2
*
* This shows the workings of VARLagSelect, showing columns for both AIC
* and SBC, and also shows sequential LR tests. Note that this uses a
* version of AIC with small-sample correction.
*
report(action=define,$
hlabel=||"Lags","AIC","SBC","LR Test","P-Value"||)
dec real lastll
do lags=1,8
system(model=usamodel)
variables loggdp unemp logi logp logm2
lags 1 to lags
det constant
end(system)
estimate(noprint) 1961:1 *
compute ll =%logl
compute sbc=-2.0*ll/%nobs+%nregsystem*log(%nobs)/%nobs
compute aic=-2.0*ll/%nobs+$
%nregsystem*2.0*%nvar/(%nvar*%nobs-%nregsystem-1)
report(row=new,atcol=1,align=decimal) lags aic sbc
if lags>1
report(row=current,atcol=4) 2*(ll-lastll) $
%chisqr(2*(ll-lastll),%nvar*%nvar)
compute lastll=ll
end do lags
report(action=format,atcol=2,tocol=2,special=onestar,$
tag=min,align=decimal)
report(action=format,atcol=3,tocol=3,special=onestar,$
tag=min,align=decimal)
report(action=format,atcol=2,tocol=3,width=8)
report(action=format,atcol=4,tocol=5,picture="*.####")
report(action=show)
Output
Test of 8 vs 4 Lags
Chi-Squared(100)= 140.198046 with Significance Level 0.00497668
Test of 8 vs 4 Lags
Log Determinants are -44.952513 -43.972107
Chi-Squared(100)= 140.198046 with Significance Level 0.00497668
VAR Lag Selection
Lags AICC
0 -3.569639
1 -26.274506
2 -28.568582*
3 -28.539238
4 -28.492796
5 -28.392054
6 -28.270686
7 -28.085545
8 -27.891979
Lags AIC SBC LR Test P-Value
1 -26.2745 -25.7617
2 -28.5686* -27.6463* 477.1473 0.0000
3 -28.5392 -27.2254 52.9180 0.0009
4 -28.4928 -26.8068 53.3541 0.0008
5 -28.3921 -26.3552 47.2856 0.0045
6 -28.2707 -25.9065 47.7982 0.0039
7 -28.0855 -25.4197 40.8089 0.0240
8 -27.8920 -24.9530 44.5020 0.0095
Copyright © 2025 Thomas A. Doan