## Evaluating Distributional Forecasts

Questions and discussions on Time Series Analysis

### Evaluating Distributional Forecasts

Hi Tom,

I am interested in evaluating density forecast accuracy.

I can evaluate distributional forecasts as per Continuous Ranked Probability Score (CRPS): https://otexts.com/fpp3/distaccuracy.html

An interesting paper:
Wagner Piazza Gaglianone & Gabriel Jaqueline Terra Moura Marins, 2017. "Evaluation of Exchange Rate Point and Density Forecasts: an application to Brazil," International Journal of Forecasting Volume 33, Issue 3, July–September 2017, pp. 707-728
Working paper: https://www.bcb.gov.br/pec/wps/ingl/wps446.pdf

I would like to compute the following accuracy measures as listed in B2 - Full-density forecast, in RATS, especially
(a) Probability Integral Transform (PIT)
(b) Log Predictive Density Score (LPDS)
and any others of interest, similar to CRPS, for the whole forecast distribution.

If I generate fan charts for 3 simple models, assuming the distribution of forecasts is normal, as below, how do I calculate the PIT distribution's (which are usually compared & plotted to U(0,1) histograms, the more similar, the better the density forecast), and LPDS, for each of the 3 models?

thanks,
Amarjit

Code: Select all
`*===============================* read in dataOPEN DATA "/Users/Shared/RATS/Examples/g10xrate.xls"DATA(FORMAT=XLS,NOLABELS,ORG=COLUMNS,TOP=2) 1 6237 USXJPN USXFRA USXSUI USXNLD USXUK \$USXBEL USXGER USXSWE USXCAN USXITA***===============================* choose seriesset y = usxjpncompute steps=25compute istart=6000compute iend=6237-steps* (1) rwwd (rw without drift) or naive modelboxjenk(diffs=1,define=foreeq_rwwd) y istart ienduforecast(equation=foreeq_rwwd,from=(iend+1),steps=steps+1,stderrs=stderrs_rwwd) yhat_rwwdset l95_rwwd iend+1 iend+steps+1 = yhat_rwwd+%invnormal(.025)*stderrs_rwwdset u95_rwwd iend+1 iend+steps+1 = yhat_rwwd+%invnormal(.975)*stderrs_rwwd* (2) rwd (rw with drift) modelboxjenk(diffs=1,constant,define=foreeq_rwd) y istart ienduforecast(equation=foreeq_rwd,from=(iend+1),steps=steps+1,stderrs=stderrs_rwd) yhat_rwdset l95_rwd iend+1 iend+steps+1 = yhat_rwd+%invnormal(.025)*stderrs_rwdset u95_rwd iend+1 iend+steps+1 = yhat_rwd+%invnormal(.975)*stderrs_rwd* (3) mean/constant modelboxjenk(diffs=0,constant,define=foreeq_mean) y istart ienduforecast(equation=foreeq_mean,from=(iend+1),steps=steps+1,stderrs=stderrs_mean) yhat_meanset l95_mean iend+1 iend+steps+1 = yhat_mean+%invnormal(.025)*stderrs_meanset u95_mean iend+1 iend+steps+1 = yhat_mean+%invnormal(.975)*stderrs_mean`
ac_1

Posts: 201
Joined: Thu Apr 15, 2010 6:30 am
Location: London, UK

### Re: Evaluating Distributional Forecasts

Isn't LPDS just the sum of the log densities of the observed data? Use %LOGDENSITY(predicted_variance(t),y(t)-predicted_mean(t)) for the summand at time t.

%CDF is used for inverting the distribution function to compute the PIT. It doesn't incorporate a non-unit variance like %LOGDENSITY so you have to convert to a Z score: %CDF((y(t)-predicted_mean(t))/sqrt(predicted_variance(t)))
TomDoan

Posts: 7236
Joined: Wed Nov 01, 2006 5:36 pm

### Re: Evaluating Distributional Forecasts

Thanks. Here's an attempt to calculate LPDS and PIT for the multi-step ahead dynamic forecasts for RWWD model. Correct?

Code: Select all
`* LPDS (a higher score implies a better model OOS)do t = iend+1, iend+steps   comp scores_rwwd = %LOGDENSITY(stderrs_rwwd(t)^2,y(t)-yhat_rwwd(t))   disp scores_rwwdend do tsstats(mean) 1 steps scores_rwwd>>lpds_rwwddisp 'lpds_rwwd' lpds_rwwd* PITdo t = iend+1, iend+steps   comp zscores_rwwd = %CDF((y(t)-yhat_rwwd(t))/stderrs_rwwd(t))   disp zscores_rwwdend do tsstats(mean) 1 steps zscores_rwwd>>pit_rwwddisp 'pit_rwwd' pit_rwwd`
ac_1

Posts: 201
Joined: Thu Apr 15, 2010 6:30 am
Location: London, UK

### Re: Evaluating Distributional Forecasts

Why are you using two DO loops rather than two SET instructions? Neither one of those is saving the calculations into different slots in a time series. You can use this for the LPDS:

set scores_rwwd iend+1 iend+steps = %LOGDENSITY(stderrs_rwwd^2,y-yhat_rwwd)
sstats(mean) iend+1 iend+steps scores_rwwd>>lpds_rwwd
disp 'lpds_rwwd' lpds_rwwd

Note, BTW, that the LPDS measure is likely to favor forecasting procedures which get a better value of the variance, than a better value of the mean.
TomDoan

Posts: 7236
Joined: Wed Nov 01, 2006 5:36 pm

### Re: Evaluating Distributional Forecasts

I want to compare various models e.g. RWD vs AR(p) vs ARIMA-SBC, (and others), w.r.t. evaluating distributional forecasts, assuming the distribution of forecasts is normal.

For a particular series, can:
- Log Predictive Density Score (LPDS)
- Continuous Ranked Probability Score (CRPS)
be simply calculated for the preceding (any) specification's, as with the loss measures for point forecasts, @uforerrors?

Are there other measures/tests useful to have in a toolbox to evaluate distributional forecasts?

In addition, I would like to plot PIT histograms and compare with U(0,1), would that be just plotting the zscores, e.g. @histogram zscores_rwwd? And where on the vertical axis to draw the horizontal line across the histogram?
ac_1

Posts: 201
Joined: Thu Apr 15, 2010 6:30 am
Location: London, UK

### Re: Evaluating Distributional Forecasts

All of those can be applied to any forecast procedure as long as you provide a full distribution for the forecasts.

For Normally distributed random variables the PIT is a monotonic transformation of the z-score.
TomDoan

Posts: 7236
Joined: Wed Nov 01, 2006 5:36 pm

### Re: Evaluating Distributional Forecasts

For LPDS (the more positive the better, correct? noting favouring of models getting a better value of the variance -- I am getting less negative LPDS scores for multi-step ahead vs. recursive one-step ahead more negative??) and CRPS (these appear reasonable), is there a minimum sample size for length of OOS forecasts?

I am not certain how to plot the PIT histograms?
ac_1

Posts: 201
Joined: Thu Apr 15, 2010 6:30 am
Location: London, UK

### Re: Evaluating Distributional Forecasts

I'm not sure papers that throw everything but the kitchen sink at a subject are worthy of emulation. The PIT is a monotonic transformation of the z-score. That means that a forecasting procedure where the actual errors are twice as bad but which also includes a standard error which is twice as high will give identical PIT's. All it really would help you do is see if a Normal distribution is a reasonable approximation to the distribution of the forecast errors, not whether the forecasts themselves are actually reasonable.
TomDoan

Posts: 7236
Joined: Wed Nov 01, 2006 5:36 pm

### Re: Evaluating Distributional Forecasts

Calculating LPDS for more series/models, multi-step and one-step, they also appear reasonable, but can be large and negative, however are not always in agreement with the point forecast loss measures.
ac_1

Posts: 201
Joined: Thu Apr 15, 2010 6:30 am
Location: London, UK

### Re: Evaluating Distributional Forecasts

Just some clarifications:

A fan chart is defined as a stream/sequence of density forecasts - an interval forecast associated with a distribution at each point in time, over the whole forecast distribution. That is to say assuming a density e.g. normal/t/etc an interval forecast can be constructed: yhat +/- critvalue*stderrs.

(i) From generating a cloud of simulations (MC or bootstrap) there are 2 ways to calculate fan charts:
(a) inferring that the distribution of future observations is normal, standard normal method
(b) assuming non-normality, fractile method
If there is no cloud of simulations, i.e. generating forecasts in the usual manner, is there a way to calculate PI's assuming that the distribution of future observations is not normal?
Also, is it "good form" to include the 0.0 and 1.0 fractiles, i.e. 100% (MAX)?

(ii) Likewise, assuming a normal density i.e. no cloud of simulations, what is the max critvalue for a PI = yhat +/- critvalue*stderrs, which is considered "good form"?

(iii) From UFORECAST, the shocks being N(0, variance of residuals) draws for SIMULATE, and for BOOTSTRAP the shocks being the residual's via the fitted model. The dynamic forecasts are generated using a DIFFERENT shock at each forecast step-ahead over the forecast period. Correct?

(iv) Thus far, generally speaking, if I compare analytical forecasts with forecasts from simulations (MC or bootstrap), they do tend to be fairly close, not identical w.r.t. loss measures, and not seed dependent; but, can be different w.r.t. directional accuracy and seed dependent. Is this expected and if so why?
ac_1

Posts: 201
Joined: Thu Apr 15, 2010 6:30 am
Location: London, UK

### Re: Evaluating Distributional Forecasts

ac_1 wrote:Calculating LPDS for more series/models, multi-step and one-step, they also appear reasonable, but can be large and negative, however are not always in agreement with the point forecast loss measures.

1. The large and negative depends upon the scale of the data (as it does with log likelihoods). If you scale your data up by a factor of 100, the densities go down and the log density goes down as well. If you scale your data down by a factor of 100, the densities go up, and the log density correspondingly go up. The value (positive or negative) of the log density tells you nothing---it's only useful as a comparison among different models.

2. Of course they're not always in agreement---they are using a completely different measure of accuracy. A forecast procedure which more accurately estimates the out-of-sample variance will probably dominate a procedure which (somewhat) more accurately forecasts the out-of-sample data.
TomDoan

Posts: 7236
Joined: Wed Nov 01, 2006 5:36 pm

### Re: Evaluating Distributional Forecasts

ac_1 wrote:Just some clarifications:

A fan chart is defined as a stream/sequence of density forecasts - an interval forecast associated with a distribution at each point in time, over the whole forecast distribution. That is to say assuming a density e.g. normal/t/etc an interval forecast can be constructed: yhat +/- critvalue*stderrs.

(i) From generating a cloud of simulations (MC or bootstrap) there are 2 ways to calculate fan charts:
(a) inferring that the distribution of future observations is normal, standard normal method
(b) assuming non-normality, fractile method
If there is no cloud of simulations, i.e. generating forecasts in the usual manner, is there a way to calculate PI's assuming that the distribution of future observations is not normal?
Also, is it "good form" to include the 0.0 and 1.0 fractiles, i.e. 100% (MAX)?

Not without a presumed distribution. And no, generally you would not include the max and min.

ac_1 wrote:(ii) Likewise, assuming a normal density i.e. no cloud of simulations, what is the max critvalue for a PI = yhat +/- critvalue*stderrs, which is considered "good form"?

1 or 2 or 1.96 are all common choices.

ac_1 wrote:(iii) From UFORECAST, the shocks being N(0, variance of residuals) draws for SIMULATE, and for BOOTSTRAP the shocks being the residual's via the fitted model. The dynamic forecasts are generated using a DIFFERENT shock at each forecast step-ahead over the forecast period. Correct?

Yes. The shocks (whether random or bootstrapped) are drawn independently. (The bootstraps are done with replacement, so some of those can be identical, but not by design).

ac_1 wrote:(iv) Thus far, generally speaking, if I compare analytical forecasts with forecasts from simulations (MC or bootstrap), they do tend to be fairly close, not identical w.r.t. loss measures, and not seed dependent; but, can be different w.r.t. directional accuracy and seed dependent. Is this expected and if so why?

TomDoan

Posts: 7236
Joined: Wed Nov 01, 2006 5:36 pm

### Re: Evaluating Distributional Forecasts

Thanks!

IMHO: forecasting "direction" for financial series is more important than "levels", however for macro and business series both "direction" and "levels" are in equal importance.
ac_1

Posts: 201
Joined: Thu Apr 15, 2010 6:30 am
Location: London, UK

### Re: Evaluating Distributional Forecasts

I'm not sure how you are defining "directional accuracy" for the analytical forecasts. If you are looking at the mean only, then that is probably 100% up or 100% down which is highly misleading. If the point forecast is +.05 with a standard deviation of .50, then if you use a Normal distribution, the probability of that being positive is about .54 and of being negative is .46. That will probably be similar to what you get if you do simulations and count the number that are up vs down.
TomDoan

Posts: 7236
Joined: Wed Nov 01, 2006 5:36 pm

### Re: Evaluating Distributional Forecasts

TomDoan wrote:I'm not sure how you are defining "directional accuracy" for the analytical forecasts. If you are looking at the mean only, then that is probably 100% up or 100% down which is highly misleading. If the point forecast is +.05 with a standard deviation of .50, then if you use a Normal distribution, the probability of that being positive is about .54 and of being negative is .46. That will probably be similar to what you get if you do simulations and count the number that are up vs down.

"directional accuracy" is defined as identical sign changes in the means in terms of ranging from 0% to 100%. The probabilities calculated are from cdf((0-0.05)/0.5)=0.46, and 1-0.46=0.54. I am unclear as to the point you are making
ac_1

Posts: 201
Joined: Thu Apr 15, 2010 6:30 am
Location: London, UK

Next