Statistics and Algorithms / Forecasting (Introduction) / Peformance Statistics |
Standard Errors of Forecast
If you use PRJ to compute forecasts, you can use the STDERRS option to produce a series of the standard errors of projection. This estimates the model through 1993:12, does forecasts through 1994:12, and creates series FORE, UPPER and LOWER with forecasts and upper and lower bounds (using 1.96 standard errors) transformed back into levels from logs.
linreg(define=saleseq) logrsales * 1993:12
# constant time
prj(stderrs=stderrs) logfore 1994:1 1994:12
set upper 1994:1 1994:12 = exp(logfore+1.96*stderrs)
set lower 1994:1 1994:12 = exp(logfore-1.96*stderrs)
set fore 1994:1 1994:12 = exp(logfore)
The standard errors are given by the formula
\begin{equation} s\sqrt {1 + x_t \left( {{\bf{X'X}}} \right)^{ - 1} x'_t } \end{equation}
where \(s\) is the standard error of the regression and \({{\bf{X'X}}}\) is its cross-product matrix. This is a special formula which only works in static forecasting situations. The variance of projection that it computes consists of two parts: one due to the error in the regression equation itself (which would be present even if the coefficients were known with certainty) and one due to the sampling error of the coefficients. In a dynamic forecast, neither of these has such a simple form. Because forecasts are fed forward, the forecast errors from early steps accumulate. And the forecasts at higher steps depend upon the estimated coefficients in a highly non-linear way. How these are computed for a dynamic model is discussed more fully in the description of the instruction ERRORS.
UFORECAST and FORECAST also have STDERRS options, but these only include the part of the standard error which is produced by the regression equation, so they are computed as if the coefficients were known exactly.
Because FORECAST is designed to handle more than one equation, its STDERRS option returns a VECTOR[SERIES]. This pulls out the forecasts and standard errors for the second equation in the model:
forecast(model=tmodel,steps=5,results=forecasts,stderrs=stderrs)
set lower 37 41 = forecasts(2)+stderrs(2)*%invnormal(.025)
set upper 37 41 = forecasts(2)+stderrs(2)*%invnormal(.975)
One-Step Forecast Errors
By using the STATIC and ERRORS options on FORECAST or UFORECAST, you can easily compute and save a series of forecast errors (the differences between the forecasted value and the actual value) for each period in the forecast range. For a dynamic model, remember that these are one-step forecasts. As with RESULTS and STDERRS, the ERRORS option for FORECAST returns a VECT[SERIES].
@UFOREERRORS Procedure
@UForeErrors is a procedure for analyzing forecast errors for a single variable. You provide it with the series of actual values and forecasts and it computes a variety of statistics on the errors.
@UForeErrors( options ) actual forecast start end
actual is the series of actual values, forecast is the series of forecasts, start to end is the range of forecasts to analyze (by default, the range of the forecast series).
It computes and defines the following statistics: the mean forecast error (%%FERRMEAN), mean absolute error (%%FERRMAE), root mean square error (%%FERRMSE), mean percentage error (%%FERRMPE), mean absolute percentage error (%%FERRMAPE) and root mean square percentage error (%%FERRMSPE).
The last three are defined only if the actual values are positive throughout the test range. They are also all defined as decimals, not true percentages, that is, they’ll report a value of 0.11 for an 11% error.
THEIL and the Theil U and other Performance Statistics
For larger models, or for multiple step forecasts, you can use the instruction THEIL to produce forecast performance statistics. You use it to compute a series of forecasts within the data range. The forecasts are compared with the actual values and a variety of statistics are compiled from the forecast errors: the mean error, the mean absolute error, root mean square error and a Theil’s U statistic. The last is a ratio of the root mean square error for the model to the root mean square error for a “no change” forecast. This is a convenient measure because it is independent of the scale of the variable. The statistics are compiled separately for each forecast horizon, that is, if you do 12 forecast steps, you’ll get separate information for each variable and for each forecast step from 1 to 12.
THEIL is usually applied to ARIMA models or VAR's. Its use is a standard part of the methodology for forecasting with VAR’s.
Comparing Forecasts
Forecasts can be compared informally by using the performance statistics described in the previous section. However, these are subject to sampling error, and so, while one forecast procedure may produce a better RMSE than another in a particular sample, it may not be clear whether that difference is really significant.
Among the procedures which have been proposed for an actual test is that of Diebold and Mariano (1995). This is based upon the use of a loss function, which could be the squared errors, but also could be a loss function tailored to the situation, like an error in the sign, if calling a direction right is more important than the actual forecast value.
The calculation is simple to perform in RATS once you’ve generated the sets of forecasts. Compute the series of the differences between the loss functions for the two forecasts. Regress this on a CONSTANT, requesting robust standard errors with the appropriate number of lags. (If you’re comparing \(K\) step forecasts, this would generally be \(K–1\)). The null is that the intercept is zero. Now, typically the alternative is that one set of forecasts is better than the other (not just that they’re different), so a one-tailed test is appropriate. Note that the series of forecasts should all be for the same forecasting horizon. Don’t mix 1 to \(H\) step forecasts in a single test.
We have provided the procedure @DMARIANO which directly handles this test for the two most important loss functions: squared errors and absolute errors. Note that the default on the LWINDOW option for the procedure is TRUNCATED, which follows the recommendation of the authors. However this could produce a non-positive standard error; unlikely in practice, but possible.
For instance, consider the rolling regressions example. That generated one set of one-step forecasts (into the series RHAT). Suppose the alternative is the “naive” forecast of the previous period’s return value, and that our loss function is the squared error. The Diebold-Mariano test can be implemented by
set naive 1992:1 1994:12 = logrsales{1}
@dmariano logrsales naive rhat
Limitations of the Diebold–Mariano Test
The Diebold-Mariano test should not be applied to situations where the competing models are nested. Examples of nested models are AR(1) vs AR(2), no change vs any ARIMA(p,1,q). In the example above, the models (the no-change forecast and the regression model) would have been nested if the lagged dependent variable were included. When the models nest, under the null the population forecast errors are asymptotically perfectly correlated, and the Diebold-Mariano statistic fails to have an asymptotic Normal distribution. An alternative testing procedure for those situations is the Clark–McCracken test (Clark and McCracken, 2001), which is implemented in the @ClarkForeTest procedure.
Copyright © 2025 Thomas A. Doan