Evaluating Distributional Forecasts

Questions and discussions on Time Series Analysis

Evaluating Distributional Forecasts

Unread postby ac_1 » Sat Jun 25, 2022 2:11 pm

Hi Tom,

I am interested in evaluating density forecast accuracy.

I can evaluate distributional forecasts as per Continuous Ranked Probability Score (CRPS): https://otexts.com/fpp3/distaccuracy.html

An interesting paper:
Wagner Piazza Gaglianone & Gabriel Jaqueline Terra Moura Marins, 2017. "Evaluation of Exchange Rate Point and Density Forecasts: an application to Brazil," International Journal of Forecasting Volume 33, Issue 3, July–September 2017, pp. 707-728
Working paper: https://www.bcb.gov.br/pec/wps/ingl/wps446.pdf

I would like to compute the following accuracy measures as listed in B2 - Full-density forecast, in RATS, especially
(a) Probability Integral Transform (PIT)
(b) Log Predictive Density Score (LPDS)
and any others of interest, similar to CRPS, for the whole forecast distribution.

If I generate fan charts for 3 simple models, assuming the distribution of forecasts is normal, as below, how do I calculate the PIT distribution's (which are usually compared & plotted to U(0,1) histograms, the more similar, the better the density forecast), and LPDS, for each of the 3 models?

thanks,
Amarjit

Code: Select all
*===============================
* read in data
OPEN DATA "/Users/Shared/RATS/Examples/g10xrate.xls"
DATA(FORMAT=XLS,NOLABELS,ORG=COLUMNS,TOP=2) 1 6237 USXJPN USXFRA USXSUI USXNLD USXUK $
USXBEL USXGER USXSWE USXCAN USXITA
*
*
*===============================
* choose series
set y = usxjpn

compute steps=25
compute istart=6000
compute iend=6237-steps

* (1) rwwd (rw without drift) or naive model
boxjenk(diffs=1,define=foreeq_rwwd) y istart iend
uforecast(equation=foreeq_rwwd,from=(iend+1),steps=steps+1,stderrs=stderrs_rwwd) yhat_rwwd
set l95_rwwd iend+1 iend+steps+1 = yhat_rwwd+%invnormal(.025)*stderrs_rwwd
set u95_rwwd iend+1 iend+steps+1 = yhat_rwwd+%invnormal(.975)*stderrs_rwwd

* (2) rwd (rw with drift) model
boxjenk(diffs=1,constant,define=foreeq_rwd) y istart iend
uforecast(equation=foreeq_rwd,from=(iend+1),steps=steps+1,stderrs=stderrs_rwd) yhat_rwd
set l95_rwd iend+1 iend+steps+1 = yhat_rwd+%invnormal(.025)*stderrs_rwd
set u95_rwd iend+1 iend+steps+1 = yhat_rwd+%invnormal(.975)*stderrs_rwd

* (3) mean/constant model
boxjenk(diffs=0,constant,define=foreeq_mean) y istart iend
uforecast(equation=foreeq_mean,from=(iend+1),steps=steps+1,stderrs=stderrs_mean) yhat_mean
set l95_mean iend+1 iend+steps+1 = yhat_mean+%invnormal(.025)*stderrs_mean
set u95_mean iend+1 iend+steps+1 = yhat_mean+%invnormal(.975)*stderrs_mean
ac_1
 
Posts: 201
Joined: Thu Apr 15, 2010 6:30 am
Location: London, UK

Re: Evaluating Distributional Forecasts

Unread postby TomDoan » Mon Jun 27, 2022 3:07 pm

Isn't LPDS just the sum of the log densities of the observed data? Use %LOGDENSITY(predicted_variance(t),y(t)-predicted_mean(t)) for the summand at time t.

%CDF is used for inverting the distribution function to compute the PIT. It doesn't incorporate a non-unit variance like %LOGDENSITY so you have to convert to a Z score: %CDF((y(t)-predicted_mean(t))/sqrt(predicted_variance(t)))
TomDoan
 
Posts: 7236
Joined: Wed Nov 01, 2006 5:36 pm

Re: Evaluating Distributional Forecasts

Unread postby ac_1 » Tue Jun 28, 2022 5:35 am

Thanks. Here's an attempt to calculate LPDS and PIT for the multi-step ahead dynamic forecasts for RWWD model. Correct?

Code: Select all
* LPDS (a higher score implies a better model OOS)
do t = iend+1, iend+steps
   comp scores_rwwd = %LOGDENSITY(stderrs_rwwd(t)^2,y(t)-yhat_rwwd(t))
   disp scores_rwwd
end do t
sstats(mean) 1 steps scores_rwwd>>lpds_rwwd
disp 'lpds_rwwd' lpds_rwwd

* PIT
do t = iend+1, iend+steps
   comp zscores_rwwd = %CDF((y(t)-yhat_rwwd(t))/stderrs_rwwd(t))
   disp zscores_rwwd
end do t
sstats(mean) 1 steps zscores_rwwd>>pit_rwwd
disp 'pit_rwwd' pit_rwwd
ac_1
 
Posts: 201
Joined: Thu Apr 15, 2010 6:30 am
Location: London, UK

Re: Evaluating Distributional Forecasts

Unread postby TomDoan » Tue Jun 28, 2022 9:02 am

Why are you using two DO loops rather than two SET instructions? Neither one of those is saving the calculations into different slots in a time series. You can use this for the LPDS:

set scores_rwwd iend+1 iend+steps = %LOGDENSITY(stderrs_rwwd^2,y-yhat_rwwd)
sstats(mean) iend+1 iend+steps scores_rwwd>>lpds_rwwd
disp 'lpds_rwwd' lpds_rwwd

Note, BTW, that the LPDS measure is likely to favor forecasting procedures which get a better value of the variance, than a better value of the mean.
TomDoan
 
Posts: 7236
Joined: Wed Nov 01, 2006 5:36 pm

Re: Evaluating Distributional Forecasts

Unread postby ac_1 » Wed Jun 29, 2022 2:02 am

I want to compare various models e.g. RWD vs AR(p) vs ARIMA-SBC, (and others), w.r.t. evaluating distributional forecasts, assuming the distribution of forecasts is normal.

For a particular series, can:
- Log Predictive Density Score (LPDS)
- Continuous Ranked Probability Score (CRPS)
be simply calculated for the preceding (any) specification's, as with the loss measures for point forecasts, @uforerrors?

Are there other measures/tests useful to have in a toolbox to evaluate distributional forecasts?

In addition, I would like to plot PIT histograms and compare with U(0,1), would that be just plotting the zscores, e.g. @histogram zscores_rwwd? And where on the vertical axis to draw the horizontal line across the histogram?
ac_1
 
Posts: 201
Joined: Thu Apr 15, 2010 6:30 am
Location: London, UK

Re: Evaluating Distributional Forecasts

Unread postby TomDoan » Wed Jun 29, 2022 10:14 am

All of those can be applied to any forecast procedure as long as you provide a full distribution for the forecasts.

For Normally distributed random variables the PIT is a monotonic transformation of the z-score.
TomDoan
 
Posts: 7236
Joined: Wed Nov 01, 2006 5:36 pm

Re: Evaluating Distributional Forecasts

Unread postby ac_1 » Wed Jun 29, 2022 11:13 am

For LPDS (the more positive the better, correct? noting favouring of models getting a better value of the variance -- I am getting less negative LPDS scores for multi-step ahead vs. recursive one-step ahead more negative??) and CRPS (these appear reasonable), is there a minimum sample size for length of OOS forecasts?

I am not certain how to plot the PIT histograms?
ac_1
 
Posts: 201
Joined: Thu Apr 15, 2010 6:30 am
Location: London, UK

Re: Evaluating Distributional Forecasts

Unread postby TomDoan » Wed Jun 29, 2022 1:28 pm

I'm not sure papers that throw everything but the kitchen sink at a subject are worthy of emulation. The PIT is a monotonic transformation of the z-score. That means that a forecasting procedure where the actual errors are twice as bad but which also includes a standard error which is twice as high will give identical PIT's. All it really would help you do is see if a Normal distribution is a reasonable approximation to the distribution of the forecast errors, not whether the forecasts themselves are actually reasonable.
TomDoan
 
Posts: 7236
Joined: Wed Nov 01, 2006 5:36 pm

Re: Evaluating Distributional Forecasts

Unread postby ac_1 » Sat Jul 02, 2022 3:54 am

Calculating LPDS for more series/models, multi-step and one-step, they also appear reasonable, but can be large and negative, however are not always in agreement with the point forecast loss measures.
ac_1
 
Posts: 201
Joined: Thu Apr 15, 2010 6:30 am
Location: London, UK

Re: Evaluating Distributional Forecasts

Unread postby ac_1 » Sun Jul 03, 2022 3:06 am

Just some clarifications:

A fan chart is defined as a stream/sequence of density forecasts - an interval forecast associated with a distribution at each point in time, over the whole forecast distribution. That is to say assuming a density e.g. normal/t/etc an interval forecast can be constructed: yhat +/- critvalue*stderrs.

(i) From generating a cloud of simulations (MC or bootstrap) there are 2 ways to calculate fan charts:
(a) inferring that the distribution of future observations is normal, standard normal method
(b) assuming non-normality, fractile method
If there is no cloud of simulations, i.e. generating forecasts in the usual manner, is there a way to calculate PI's assuming that the distribution of future observations is not normal?
Also, is it "good form" to include the 0.0 and 1.0 fractiles, i.e. 100% (MAX)?

(ii) Likewise, assuming a normal density i.e. no cloud of simulations, what is the max critvalue for a PI = yhat +/- critvalue*stderrs, which is considered "good form"?

(iii) From UFORECAST, the shocks being N(0, variance of residuals) draws for SIMULATE, and for BOOTSTRAP the shocks being the residual's via the fitted model. The dynamic forecasts are generated using a DIFFERENT shock at each forecast step-ahead over the forecast period. Correct?

(iv) Thus far, generally speaking, if I compare analytical forecasts with forecasts from simulations (MC or bootstrap), they do tend to be fairly close, not identical w.r.t. loss measures, and not seed dependent; but, can be different w.r.t. directional accuracy and seed dependent. Is this expected and if so why?
ac_1
 
Posts: 201
Joined: Thu Apr 15, 2010 6:30 am
Location: London, UK

Re: Evaluating Distributional Forecasts

Unread postby TomDoan » Sun Jul 03, 2022 9:08 am

ac_1 wrote:Calculating LPDS for more series/models, multi-step and one-step, they also appear reasonable, but can be large and negative, however are not always in agreement with the point forecast loss measures.


1. The large and negative depends upon the scale of the data (as it does with log likelihoods). If you scale your data up by a factor of 100, the densities go down and the log density goes down as well. If you scale your data down by a factor of 100, the densities go up, and the log density correspondingly go up. The value (positive or negative) of the log density tells you nothing---it's only useful as a comparison among different models.

2. Of course they're not always in agreement---they are using a completely different measure of accuracy. A forecast procedure which more accurately estimates the out-of-sample variance will probably dominate a procedure which (somewhat) more accurately forecasts the out-of-sample data.
TomDoan
 
Posts: 7236
Joined: Wed Nov 01, 2006 5:36 pm

Re: Evaluating Distributional Forecasts

Unread postby TomDoan » Sun Jul 03, 2022 9:16 am

ac_1 wrote:Just some clarifications:

A fan chart is defined as a stream/sequence of density forecasts - an interval forecast associated with a distribution at each point in time, over the whole forecast distribution. That is to say assuming a density e.g. normal/t/etc an interval forecast can be constructed: yhat +/- critvalue*stderrs.

(i) From generating a cloud of simulations (MC or bootstrap) there are 2 ways to calculate fan charts:
(a) inferring that the distribution of future observations is normal, standard normal method
(b) assuming non-normality, fractile method
If there is no cloud of simulations, i.e. generating forecasts in the usual manner, is there a way to calculate PI's assuming that the distribution of future observations is not normal?
Also, is it "good form" to include the 0.0 and 1.0 fractiles, i.e. 100% (MAX)?


Not without a presumed distribution. And no, generally you would not include the max and min.

ac_1 wrote:(ii) Likewise, assuming a normal density i.e. no cloud of simulations, what is the max critvalue for a PI = yhat +/- critvalue*stderrs, which is considered "good form"?


1 or 2 or 1.96 are all common choices.

ac_1 wrote:(iii) From UFORECAST, the shocks being N(0, variance of residuals) draws for SIMULATE, and for BOOTSTRAP the shocks being the residual's via the fitted model. The dynamic forecasts are generated using a DIFFERENT shock at each forecast step-ahead over the forecast period. Correct?


Yes. The shocks (whether random or bootstrapped) are drawn independently. (The bootstraps are done with replacement, so some of those can be identical, but not by design).

ac_1 wrote:(iv) Thus far, generally speaking, if I compare analytical forecasts with forecasts from simulations (MC or bootstrap), they do tend to be fairly close, not identical w.r.t. loss measures, and not seed dependent; but, can be different w.r.t. directional accuracy and seed dependent. Is this expected and if so why?


I'm sure you've already asked about this, but the analytical forecasts are not even trying to estimate the "direction".
TomDoan
 
Posts: 7236
Joined: Wed Nov 01, 2006 5:36 pm

Re: Evaluating Distributional Forecasts

Unread postby ac_1 » Mon Jul 04, 2022 2:33 am

Thanks!


TomDoan wrote:I'm sure you've already asked about this, but the analytical forecasts are not even trying to estimate the "direction".

IMHO: forecasting "direction" for financial series is more important than "levels", however for macro and business series both "direction" and "levels" are in equal importance.
ac_1
 
Posts: 201
Joined: Thu Apr 15, 2010 6:30 am
Location: London, UK

Re: Evaluating Distributional Forecasts

Unread postby TomDoan » Mon Jul 04, 2022 12:58 pm

I'm not sure how you are defining "directional accuracy" for the analytical forecasts. If you are looking at the mean only, then that is probably 100% up or 100% down which is highly misleading. If the point forecast is +.05 with a standard deviation of .50, then if you use a Normal distribution, the probability of that being positive is about .54 and of being negative is .46. That will probably be similar to what you get if you do simulations and count the number that are up vs down.
TomDoan
 
Posts: 7236
Joined: Wed Nov 01, 2006 5:36 pm

Re: Evaluating Distributional Forecasts

Unread postby ac_1 » Tue Jul 05, 2022 3:07 am

TomDoan wrote:I'm not sure how you are defining "directional accuracy" for the analytical forecasts. If you are looking at the mean only, then that is probably 100% up or 100% down which is highly misleading. If the point forecast is +.05 with a standard deviation of .50, then if you use a Normal distribution, the probability of that being positive is about .54 and of being negative is .46. That will probably be similar to what you get if you do simulations and count the number that are up vs down.


"directional accuracy" is defined as identical sign changes in the means in terms of ranging from 0% to 100%. The probabilities calculated are from cdf((0-0.05)/0.5)=0.46, and 1-0.46=0.54. I am unclear as to the point you are making :?:
ac_1
 
Posts: 201
Joined: Thu Apr 15, 2010 6:30 am
Location: London, UK

Next

Return to Other Time Series Analysis

Who is online

Users browsing this forum: No registered users and 1 guest

cron