Ensemble and Combination Models' Methodologies
Ensemble and Combination Models' Methodologies
Hi Tom,
Let's say there are multiple e.g. 2 models A and B generating forecasts with PI's. There are (probably at least) a couple of methods to 'join or merge or put together' the forecasts i.e. Ensemble or Combination.
For both of these: the aggregated point forecast's could be the mean's of the point forecasts, as a simple average e.g. (0.5*forecastA + 0.5*forecastB), but how to aggregate/mix each forecast(s) distributions, hence generate the aggregated/mixed PI's in RATS?
Enders (2014) AETS 4thEdn p109-112, discusses combining forecasts, however not with regard to PI’s.
Amarjit
Let's say there are multiple e.g. 2 models A and B generating forecasts with PI's. There are (probably at least) a couple of methods to 'join or merge or put together' the forecasts i.e. Ensemble or Combination.
For both of these: the aggregated point forecast's could be the mean's of the point forecasts, as a simple average e.g. (0.5*forecastA + 0.5*forecastB), but how to aggregate/mix each forecast(s) distributions, hence generate the aggregated/mixed PI's in RATS?
Enders (2014) AETS 4thEdn p109-112, discusses combining forecasts, however not with regard to PI’s.
Amarjit
Re: Ensemble and Combination Models' Methodologies
To get PI's, you would need a great deal more information, which often isn't available. You would not only need the variances of each set of forecasts, but also the covariances among them. And if you had that level of detail, then you wouldn't be taking simple averages, but would be taking weighted averages.
Re: Ensemble and Combination Models' Methodologies
Thanks.
I can follow Enders (2014) AETS 4thEdn p109-112, and the RATS code https://estima.com/textbooks/enders_4/enders4p111.rpf
I think this is referred to as Combination Method: taking a weighted average of the forecasts.
And thereafter account for the covariances between the forecast error distributions of the n individual models to calculate the Combination PI's, i.e. the Combined quantile forecasts.
How to calculate the covariances? Is this using VCV or CMOM on the list of forecast error (fn(t) - actual(t)) series?
I can follow Enders (2014) AETS 4thEdn p109-112, and the RATS code https://estima.com/textbooks/enders_4/enders4p111.rpf
I think this is referred to as Combination Method: taking a weighted average of the forecasts.
And thereafter account for the covariances between the forecast error distributions of the n individual models to calculate the Combination PI's, i.e. the Combined quantile forecasts.
But which variances are these? Are they squaring STDERRS from the FORECAST instruction?TomDoan wrote:You would not only need the variances of each set of forecasts, but also the covariances among them.
How to calculate the covariances? Is this using VCV or CMOM on the list of forecast error (fn(t) - actual(t)) series?
Re: Ensemble and Combination Models' Methodologies
No. If you have the information, you would compute the complete covariance matrix with VCV applied to the forecast errors. You can't compute the variances one way and the covariances another as you could end up with a non-positive definite matrix.
Re: Ensemble and Combination Models' Methodologies
From the example above, the covariance/correlation matrix applied to the forecast errors is
I can also include in UFORECAST if needed.
As a check, the 'usual way' without the hash and %keys
From the covariance/correlation matrix I can extract the variances on the diagonal but how to extract the covariances below the diagonal and correlations above the diagonal?
Thus, how to proceed to calculate the Combination PI's?
Code: Select all
* covariance/correlation matrix
vcv(matrix=v)
# herrorsCode: Select all
stderrs=stderrs(s)As a check, the 'usual way' without the hash and %keys
Code: Select all
do time=2000:3,2012:4
boxjenk(noprint,constant,define=ar7,ar=7) spread * time-1
boxjenk(noprint,constant,define=ar6,ar=6) spread * time-1
boxjenk(noprint,constant,define=ar2,ar=2) spread * time-1
boxjenk(noprint,constant,define=ar127,ar=||1,2,7||) spread * time-1
boxjenk(noprint,constant,define=ar1ma1,ar=1,ma=1) spread * time-1
boxjenk(noprint,constant,define=ar2ma1,ar=2,ma=1) spread * time-1
boxjenk(noprint,constant,define=ar2ma17,ar=2,ma=||1,7||) spread * time-1
*
uforecast(equation=ar7,errors=errors_ar7,stderrs=stderrs_ar7,static) fore_ar7 time time
uforecast(equation=ar6,errors=errors_ar6,stderrs=stderrs_ar6,static) fore_ar6 time time
uforecast(equation=ar2,errors=errors_ar2,stderrs=stderrs_ar2,static) fore_ar2 time time
uforecast(equation=ar127,errors=errors_ar127,stderrs=stderrs_ar127,static) fore_ar127 time time
uforecast(equation=ar1ma1,errors=errors_ar1ma1,stderrs=stderrs_ar1ma1,static) fore_ar1ma1 time time
uforecast(equation=ar2ma1,errors=errors_ar2ma1,stderrs=stderrs_ar2ma1,static) fore_ar2ma1 time time
uforecast(equation=ar2ma17,errors=errors_ar2ma17,stderrs=stderrs_ar2ma17,static) fore_ar2ma17 time time
end do
* covariance/correlation matrix
vcv(matrix=v)
# errors_ar7 errors_ar6 errors_ar2 errors_ar127 errors_ar1ma1 errors_ar2ma1 errors_ar2ma17
compute qbar=%cvtocorr(%sigma)
Code: Select all
comp d=%xdiag(v)Thus, how to proceed to calculate the Combination PI's?
Re: Ensemble and Combination Models' Methodologies
I'm not sure what the correlations have to do with anything, but you have an estimate of the covariance matrix of the forecast errors. The variance of a linear combination of those is x'Vx where V is the covariance matrix.
Re: Ensemble and Combination Models' Methodologies
Yes, agreed, %SIGMA and the MATRIX option via the VCV instruction both give a variance/covariance matrix.TomDoan wrote:I'm not sure what the correlations have to do with anything
I think this is the correct way round, as per https://estima.com/tour/rats_basics__functions.shtml, resulting in a 50*50 non-symmetric matrixTomDoan wrote:The variance of a linear combination of those is x'Vx where V is the covariance matrix.
Code: Select all
* create an array from the entries of data series
make xerrors 2000:03 2012:04
# herrors
disp xerrors
* display variance/covariance matrix
disp v
* linear combination x'vx
comp lc_xerrors = xerrors * v * tr(xerrors)
disp lc_xerrorsRe: Ensemble and Combination Models' Methodologies
No. That's not at all correct. The code you were working with was creating a optimal (in sample) linear combination of the forecasts to create a single forecast, which will have a scalar variance.
Re: Ensemble and Combination Models' Methodologies
v is a 7*7 variance/covariance matrix i.e. there are 7 models, each of the 7 models having 50 one-step ahead OOS forecasts.
For a scalar variance x should be a 7*1 vector, thus x must be the optimal weights. Correct?
For a scalar variance x should be a 7*1 vector, thus x must be the optimal weights. Correct?
Re: Ensemble and Combination Models' Methodologies
Right. That's computing a single linear combination for the forecasts that applies at each time period. (It computes it by minimizing sum of squared errors across the training sample),.
Re: Ensemble and Combination Models' Methodologies
Code: Select all
* fweights
disp fweights
* weights via LQPROG
disp x
* combined forecasts using fweights
set fc_fweights 2000:03 2012:04 = (fweights(1)*fore(eqnkeys(1))) + $
(fweights(2)*fore(eqnkeys(2))) + $
(fweights(3)*fore(eqnkeys(3))) + $
(fweights(4)*fore(eqnkeys(4))) + $
(fweights(5)*fore(eqnkeys(5))) + $
(fweights(6)*fore(eqnkeys(6))) + $
(fweights(7)*fore(eqnkeys(7)))
prin 2000:03 2012:04 fc_fweights
* combined forecasts using LQPROG
set fc_LQPROG 2000:03 2012:04 = (x(1)*fore(eqnkeys(1))) + $
(x(2)*fore(eqnkeys(2))) + $
(x(3)*fore(eqnkeys(3))) + $
(x(4)*fore(eqnkeys(4))) + $
(x(5)*fore(eqnkeys(5))) + $
(x(6)*fore(eqnkeys(6))) + $
(x(7)*fore(eqnkeys(7)))
prin 2000:03 2012:04 fc_LQPROG
* display variance/covariance matrix
disp v
* scalar variance fweights linear combination fweights'vfweights
comp var_fweights = tr(fweights) * v * fweights
disp var_fweights
comp sigma_fweights = %sqrt(var_fweights)
disp sigma_fweights
* scalar variance LQPROG linear combination x'vx
comp var_LQPROG = tr(x) * v * x
disp var_LQPROG
comp sigma_LQPROG = %sqrt(var_LQPROG)
disp sigma_LQPROG
The combined forecasts vary across the 50 time-steps, but the square-root of the combined forecasts variance should be changing as-well, as in STDERRS from UFORECAST.
How to calculate the combined forecast SE's for each time period?
Re: Ensemble and Combination Models' Methodologies
First of all, those strike me as being trivially different---none of these are so accurate that a 1% difference in standard errors should be much to bother about. But the other is that you are trying to combine calculations which are done with very different attitudes towards model stability. Your forecasts and forecast errors are being generated using rolling regressions, but the "optimal" combinations of forecasts are being computing using a single optimization across all those forecast errors generated assuming a single model isn't adequate. The problem is that there is simply not enough information to do separate optimizations at each data point---you have, after all, only one actual observation at each entry, and this is generating seven separate forecasts for it. You would need to make heroic assumptions to generate a series of separate full rank covariance matrices at each data point, and the results probably wouldn't be much different in the end.
Re: Ensemble and Combination Models' Methodologies
Visually plotting the 2 combination forecasts vs. the 7 models forecasts results in smoother forecasts as one would expect.
The RMSE's OOS are:
ar7: 0.42758414
ar6: 0.42056211
ar2: 0.40841082
ar127: 0.40844838
ar1ma1: 0.40697306
ar2ma1: 0.41691831
ar2ma17: 0.40659699
fc_fweights: 0.40857999
fc_LQPROG: 0.40350209
The aim is to have combination models applicable for:
- one-step ahead forecasts
- multi-step ahead forecasts
mean forecasts and PI's:
- not only for OOS 1 to 50 points (as in this example)
- but also for TOOS (true-out-of-sample) i.e. from the 51st point onwards.
To simplify the task assume just 2 models A and B "equally" weighted, OOS and TOOS:
Assuming normality. The mean forecast is: (0.5*forecastA + 0.5*forecastB). And also independence of forecast distribution's A and B i.e. covariance=0 for each forecast point. Is the square-root of the variance not just the square-root of the sum of the square of STDERRS of both models A and B equally weighted: sqrt(0.5*STDERRSA^2 + 0.5*STDERRSB^2)? Thus, it is easy to generate the PI's. Or is that unrealistic?
The RMSE's OOS are:
ar7: 0.42758414
ar6: 0.42056211
ar2: 0.40841082
ar127: 0.40844838
ar1ma1: 0.40697306
ar2ma1: 0.41691831
ar2ma17: 0.40659699
fc_fweights: 0.40857999
fc_LQPROG: 0.40350209
The aim is to have combination models applicable for:
- one-step ahead forecasts
- multi-step ahead forecasts
mean forecasts and PI's:
- not only for OOS 1 to 50 points (as in this example)
- but also for TOOS (true-out-of-sample) i.e. from the 51st point onwards.
To simplify the task assume just 2 models A and B "equally" weighted, OOS and TOOS:
Assuming normality. The mean forecast is: (0.5*forecastA + 0.5*forecastB). And also independence of forecast distribution's A and B i.e. covariance=0 for each forecast point. Is the square-root of the variance not just the square-root of the sum of the square of STDERRS of both models A and B equally weighted: sqrt(0.5*STDERRSA^2 + 0.5*STDERRSB^2)? Thus, it is easy to generate the PI's. Or is that unrealistic?
Re: Ensemble and Combination Models' Methodologies
Didn't you just compute a covariance matrix of the forecast errors? Aren't they quite far from being "independent"? (I would think they might be >.90 correlated, aren't they?) So any calculation as if they are independent would be highly misleading. Note also that the weights for short-term forecasts are likely to be different from the weights for long-term forecasts, probably by quite a bit. (Simpler models often dominate in accuracy for longer range forecasts).
Re: Ensemble and Combination Models' Methodologies
Yes, the forecast errors are highly correlated: 0.94798 to 0.99933 (and obviously 1). But I do not want to ‘fit’ to the OOS forecasts; and then there’s TOOS forecasts where forecast errors cannot be calculated - which is why I'd like to use the n component models forecast STDERRS.
As you say: The problem is that there is simply not enough information as I need to calculate covariance matrices at each data point OOS.
Back to the question: how to calculate Combination forecast PI's (equally weighted or optimized), OOS and TOOS taking into account the forecast errors, without 'fitting' to the OOS forecasts?
As you say: The problem is that there is simply not enough information as I need to calculate covariance matrices at each data point OOS.
Back to the question: how to calculate Combination forecast PI's (equally weighted or optimized), OOS and TOOS taking into account the forecast errors, without 'fitting' to the OOS forecasts?