Examples / REGARIMA.RPF |
REGARIMA.RPF is an example of estimation of a RegARIMA model (linear regression with ARIMA errors), done using BOXJENK with the GLS option.
This analyzes a series of (raw) household appliance sales. (This is an example file for Census X12). Because sales are higher on certain days of the week (Friday, Saturday and Sunday), there is a “trading day” effect in the data. We also need to account for a predictable “leap year” effect, since a February with a leap year has one extra day to sell appliances than Februaries which don’t.
The following is used to generate the shift regressors, trading day counts (differences between the other days of the week and Sunday) and the leap year dummy. %TRADEDAY returns a count of the number of each day of the week, using 1=Monday to 7=Sunday for the second argument; the first is the entry number. LPYR (which is zero for all months other than February) is centered to sum to 0 over a full four year cycle.
dec vect[series] td(6)
do i=1,6
set td(i) = %tradeday(t,i)-%tradeday(t,7)
end do i
set lpyr = %if(%period(t)==2,-.25+(%clock(%year(t),4)==4),0.0)
The log of series (graph) is analyzed, so, for instance, the LPYR dummy estimates the percentage difference due to February 29.
This regresses log sales on 1 and the shifts. (This could also have been done using LINREG).
boxjenk(gls) logsales
# constant td lpyr
This identifies the noise model by applying @BJIDENT to the residuals. The "airline" model of an ARIMA(0,1,1)x(0,1,1) seems appropriate as a first guess. (We show the graph only of the 1 regular, 1 seasonal difference below, as the others show clear non-stationarity).
@bjident(diffs=1,sdiffs=1) %resids
boxjenk(gls,diffs=1,sdiffs=1,ma=1,sma=1) logsales
# constant td lpyr
This re-estimates the model, dropping the apparently unnecessary MA term and adds a check for additive outliers (none are found). The ADJUST option returns the adjustment series (the fitted values of the regression part).
boxjenk(gls,diffs=1,sdiffs=1,sma=1,outlier=ao,$
adjust=tdadjust) logsales
# constant td lpyr
This gives the calendar-adjusted series, which is graph vs the actual log series over a single year. (The series are so similar that over a longer span, the differences between the two are almost impossible to see). Note that this is not the seasonally adjusted series—the point of the calendar adjustment is to eliminate the much more minor, but predictable, effects of the day-of-the-week composition of the months.
set adjusted = logsales-tdadjust
graph(key=below,klabels=||"Actual","Calendar Adjusted"||) 2
# logsales 1980:1 1980:12
# adjusted 1980:1 1980:12
Full Program
cal(m) 1972:7
open data houseapplsales.xls
data(format=xls,org=columns) 1972:07 1988:06 houseapplsales
*
* Generate trading day regressors. These are the differences between the
* counts of each day of the week falling in a month (Monday=1, ...,
* Saturday=6) from the number of Sundays.
*
dec vect[series] td(6)
do i=1,6
set td(i) = %tradeday(t,i)-%tradeday(t,7)
end do i
*
* Generate a leap year shift for Februaries. This sums to zero across a
* four year cycle.
*
set lpyr = %if(%period(t)==2,-.25+(%clock(%year(t),4)==4),0.0)
*
set logsales = log(houseapplsales)
graph(footer="Log Appliance Sales")
# logsales
*
* Regress log sales on 1 and the shifts. (This could also have been done
* using LINREG).
*
boxjenk(gls) logsales
# constant td lpyr
*
* Identify the noise model. The "airline" model of an
* ARIMA(0,1,1)x(0,1,1) seems appropriate.
*
@bjident(diffs=1,sdiffs=1) %resids
boxjenk(gls,diffs=1,sdiffs=1,ma=1,sma=1) logsales
# constant td lpyr
*
* Drop the apparently unnecessary MA term, check for additive outliers,
* and generate the trading day adjustment series based on the regression.
*
boxjenk(gls,diffs=1,sdiffs=1,sma=1,outlier=ao,$
adjust=tdadjust) logsales
# constant td lpyr
*
* This gives the "calendar" adjusted series.
*
set adjusted = logsales-tdadjust
graph(key=below,klabels=||"Actual","Calendar Adjusted"||) 2
# logsales 1980:1 1980:12
# adjusted 1980:1 1980:12
Output
Box-Jenkins - Estimation by ML Gauss-Newton
Convergence in 0 Iterations. Final criterion was 0.0000000 <= 0.0000100
Dependent Variable LOGSALES
Monthly Data From 1972:07 To 1988:06
Usable Observations 192
Degrees of Freedom 184
Centered R^2 0.0044198
R-Bar^2 -0.0334555
Uncentered R^2 0.9951883
Mean of Dependent Variable 7.0429065120
Std Error of Dependent Variable 0.4920971657
Standard Error of Estimate 0.5002611327
Sum of Squared Residuals 46.048060967
Regression F(7,184) 0.1167
Significance Level of F 0.9971687
Log Likelihood -135.3665
Durbin-Watson Statistic 0.1413
Q(36-0) 3584.0910
Significance Level of Q 0.0000000
Variable Coeff Std Error T-Stat Signif
************************************************************************************
1. Constant 7.042850902 0.036106999 195.05501 0.00000000
2. TD(1) 0.005682544 0.098831572 0.05750 0.95421148
3. TD(2) 0.002878716 0.100050639 0.02877 0.97707714
4. TD(3) -0.000659920 0.101029913 -0.00653 0.99479539
5. TD(4) 0.016611643 0.100480097 0.16532 0.86887147
6. TD(5) -0.010677062 0.100062782 -0.10670 0.91514035
7. TD(6) 0.015112164 0.100707262 0.15006 0.88088139
8. LPYR 0.240358049 0.292187931 0.82261 0.41179199
Box-Jenkins - Estimation by ML Gauss-Newton
Convergence in 11 Iterations. Final criterion was 0.0000035 <= 0.0000100
Dependent Variable LOGSALES, differenced 1 Regular/1 Seasonal
Monthly Data From 1973:08 To 1988:06
Usable Observations 179
Degrees of Freedom 169
Centered R^2 0.9947903
R-Bar^2 0.9945128
Uncentered R^2 0.9999776
Mean of Dependent Variable 7.0949536798
Std Error of Dependent Variable 0.4677399149
Standard Error of Estimate 0.0346480792
Sum of Squared Residuals 0.2028827067
Log Likelihood 350.5886
Durbin-Watson Statistic 1.9648
Q(36-2) 32.9810
Significance Level of Q 0.5174174
Variable Coeff Std Error T-Stat Signif
************************************************************************************
1. Constant 0.000000000 0.000000000 0.00000 0.00000000
2. TD(1) -0.004227420 0.003508270 -1.20499 0.22989287
3. TD(2) 0.004523265 0.003596193 1.25779 0.21020189
4. TD(3) 0.000751935 0.003596012 0.20910 0.83461996
5. TD(4) -0.000415928 0.003623235 -0.11479 0.90874412
6. TD(5) 0.008550153 0.003581837 2.38709 0.01808690
7. TD(6) 0.010302555 0.003655870 2.81809 0.00540760
8. LPYR 0.032423034 0.012072065 2.68579 0.00795769
9. MA{1} -0.029723269 0.075781383 -0.39222 0.69538699
10. SMA{12} -0.579565358 0.067677734 -8.56360 0.00000000
Forward Addition pass 1
Largest t-statistic is AO(1974:06)= -2.431 < 3.934 in abs value
Box-Jenkins - Estimation by ML Gauss-Newton
Convergence in 6 Iterations. Final criterion was 0.0000030 <= 0.0000100
Dependent Variable LOGSALES, differenced 1 Regular/1 Seasonal
Monthly Data From 1973:08 To 1988:06
Usable Observations 179
Degrees of Freedom 170
Centered R^2 0.9947872
R-Bar^2 0.9945419
Uncentered R^2 0.9999776
Mean of Dependent Variable 7.0949536798
Std Error of Dependent Variable 0.4677399149
Standard Error of Estimate 0.0345562911
Sum of Squared Residuals 0.2030033333
Log Likelihood 350.5342
Durbin-Watson Statistic 2.0144
Q(36-1) 31.9984
Significance Level of Q 0.6137834
Variable Coeff Std Error T-Stat Signif
************************************************************************************
1. Constant 0.000000000 0.000000000 0.00000 0.00000000
2. TD(1) -0.004196510 0.003441630 -1.21934 0.22440511
3. TD(2) 0.004534309 0.003525496 1.28615 0.20014040
4. TD(3) 0.000622904 0.003499000 0.17802 0.85891649
5. TD(4) -0.000390495 0.003545632 -0.11013 0.91243294
6. TD(5) 0.008686343 0.003507793 2.47630 0.01425400
7. TD(6) 0.010143736 0.003567492 2.84338 0.00501134
8. LPYR 0.032035734 0.011859245 2.70133 0.00760588
9. SMA{12} -0.579720194 0.067426445 -8.59782 0.00000000
Graphs
Copyright © 2025 Thomas A. Doan