RATS 10.1
RATS 10.1

REGARIMA.RPF is an example of estimation of a RegARIMA model (linear regression with ARIMA errors), done using BOXJENK with the GLS option.

 

This analyzes a series of (raw) household appliance sales. (This is an example file for Census X12). Because sales are higher on certain days of the week (Friday, Saturday and Sunday), there is a “trading day” effect in the data. We also need to account for a predictable “leap year” effect, since a February with a leap year has one extra day to sell appliances than Februaries which don’t.

 

The following is used to generate the shift regressors, trading day counts (differences between the other days of the week and Sunday) and the leap year dummy. %TRADEDAY returns a count of the number of each day of the week, using 1=Monday to 7=Sunday for the second argument; the first is the entry number. LPYR (which is zero for all months other than February) is centered to sum to 0 over a full four year cycle.

 

dec vect[series] td(6)

do i=1,6

   set td(i) = %tradeday(t,i)-%tradeday(t,7)

end do i

set lpyr = %if(%period(t)==2,-.25+(%clock(%year(t),4)==4),0.0)

 

The log of series (graph) is analyzed, so, for instance, the LPYR dummy estimates the percentage difference due to February 29.

 

This regresses log sales on 1 and the shifts. (This could also have been done using LINREG).


boxjenk(gls) logsales
# constant td lpyr
 

This identifies the noise model by applying @BJIDENT to the residuals. The "airline" model of an ARIMA(0,1,1)x(0,1,1) seems appropriate as a first guess. (We show the graph only of the 1 regular, 1 seasonal difference below, as the others show clear non-stationarity).


@bjident(diffs=1,sdiffs=1) %resids
boxjenk(gls,diffs=1,sdiffs=1,ma=1,sma=1) logsales
# constant td lpyr
 

This re-estimates the model, dropping the apparently unnecessary MA term and adds a check for additive outliers (none are found). The ADJUST option returns the adjustment series (the fitted values of the regression part).

 

boxjenk(gls,diffs=1,sdiffs=1,sma=1,outlier=ao,$
  adjust=tdadjust) logsales
# constant td lpyr

 

This gives the calendar-adjusted series, which is graph vs the actual log series over a single year. (The series are so similar that over a longer span, the differences between the two are almost impossible to see). Note that this is not the seasonally adjusted series—the point of the calendar adjustment is to eliminate the much more minor, but predictable, effects of the day-of-the-week composition of the months.

 

set adjusted = logsales-tdadjust

graph(key=below,klabels=||"Actual","Calendar Adjusted"||) 2

# logsales 1980:1 1980:12

# adjusted 1980:1 1980:12

 

Full Program

 

cal(m) 1972:7

open data houseapplsales.xls

data(format=xls,org=columns) 1972:07 1988:06 houseapplsales

*

* Generate trading day regressors. These are the differences between the

* counts of each day of the week falling in a month (Monday=1, ...,

* Saturday=6) from the number of Sundays.

*

dec vect[series] td(6)

do i=1,6

   set td(i) = %tradeday(t,i)-%tradeday(t,7)

end do i

*

* Generate a leap year shift for Februaries. This sums to zero across a

* four year cycle.

*

set lpyr = %if(%period(t)==2,-.25+(%clock(%year(t),4)==4),0.0)

*

set logsales = log(houseapplsales)

graph(footer="Log Appliance Sales")

# logsales

*

* Regress log sales on 1 and the shifts. (This could also have been done

* using LINREG).

*

boxjenk(gls) logsales

# constant td lpyr

*

* Identify the noise model. The "airline" model of an

* ARIMA(0,1,1)x(0,1,1) seems appropriate.

*

@bjident(diffs=1,sdiffs=1) %resids

boxjenk(gls,diffs=1,sdiffs=1,ma=1,sma=1) logsales

# constant td lpyr

*

* Drop the apparently unnecessary MA term, check for additive outliers,

* and generate the trading day adjustment series based on the regression.

*

boxjenk(gls,diffs=1,sdiffs=1,sma=1,outlier=ao,$

  adjust=tdadjust) logsales

# constant td lpyr

*

* This gives the "calendar" adjusted series.

*

set adjusted = logsales-tdadjust

graph(key=below,klabels=||"Actual","Calendar Adjusted"||) 2

# logsales 1980:1 1980:12

# adjusted 1980:1 1980:12

 

 

Output
 

Box-Jenkins - Estimation by ML Gauss-Newton

Convergence in     0 Iterations. Final criterion was  0.0000000 <=  0.0000100

 

Dependent Variable LOGSALES

Monthly Data From 1972:07 To 1988:06

Usable Observations                       192

Degrees of Freedom                        184

Centered R^2                        0.0044198

R-Bar^2                            -0.0334555

Uncentered R^2                      0.9951883

Mean of Dependent Variable       7.0429065120

Std Error of Dependent Variable  0.4920971657

Standard Error of Estimate       0.5002611327

Sum of Squared Residuals         46.048060967

Regression F(7,184)                    0.1167

Significance Level of F             0.9971687

Log Likelihood                      -135.3665

Durbin-Watson Statistic                0.1413

Q(36-0)                             3584.0910

Significance Level of Q             0.0000000

 

    Variable                        Coeff      Std Error      T-Stat      Signif

************************************************************************************

1.  Constant                      7.042850902  0.036106999    195.05501  0.00000000

2.  TD(1)                         0.005682544  0.098831572      0.05750  0.95421148

3.  TD(2)                         0.002878716  0.100050639      0.02877  0.97707714

4.  TD(3)                        -0.000659920  0.101029913     -0.00653  0.99479539

5.  TD(4)                         0.016611643  0.100480097      0.16532  0.86887147

6.  TD(5)                        -0.010677062  0.100062782     -0.10670  0.91514035

7.  TD(6)                         0.015112164  0.100707262      0.15006  0.88088139

8.  LPYR                          0.240358049  0.292187931      0.82261  0.41179199

 

 

Box-Jenkins - Estimation by ML Gauss-Newton

Convergence in    11 Iterations. Final criterion was  0.0000035 <=  0.0000100

 

Dependent Variable LOGSALES, differenced 1 Regular/1 Seasonal

Monthly Data From 1973:08 To 1988:06

Usable Observations                       179

Degrees of Freedom                        169

Centered R^2                        0.9947903

R-Bar^2                             0.9945128

Uncentered R^2                      0.9999776

Mean of Dependent Variable       7.0949536798

Std Error of Dependent Variable  0.4677399149

Standard Error of Estimate       0.0346480792

Sum of Squared Residuals         0.2028827067

Log Likelihood                       350.5886

Durbin-Watson Statistic                1.9648

Q(36-2)                               32.9810

Significance Level of Q             0.5174174

 

    Variable                        Coeff      Std Error      T-Stat      Signif

************************************************************************************

1.  Constant                      0.000000000  0.000000000      0.00000  0.00000000

2.  TD(1)                        -0.004227420  0.003508270     -1.20499  0.22989287

3.  TD(2)                         0.004523265  0.003596193      1.25779  0.21020189

4.  TD(3)                         0.000751935  0.003596012      0.20910  0.83461996

5.  TD(4)                        -0.000415928  0.003623235     -0.11479  0.90874412

6.  TD(5)                         0.008550153  0.003581837      2.38709  0.01808690

7.  TD(6)                         0.010302555  0.003655870      2.81809  0.00540760

8.  LPYR                          0.032423034  0.012072065      2.68579  0.00795769

9.  MA{1}                        -0.029723269  0.075781383     -0.39222  0.69538699

10. SMA{12}                      -0.579565358  0.067677734     -8.56360  0.00000000

 

 

Forward Addition pass 1

Largest t-statistic is AO(1974:06)=   -2.431 <  3.934 in abs value

 

Box-Jenkins - Estimation by ML Gauss-Newton

Convergence in     6 Iterations. Final criterion was  0.0000030 <=  0.0000100

 

Dependent Variable LOGSALES, differenced 1 Regular/1 Seasonal

Monthly Data From 1973:08 To 1988:06

Usable Observations                       179

Degrees of Freedom                        170

Centered R^2                        0.9947872

R-Bar^2                             0.9945419

Uncentered R^2                      0.9999776

Mean of Dependent Variable       7.0949536798

Std Error of Dependent Variable  0.4677399149

Standard Error of Estimate       0.0345562911

Sum of Squared Residuals         0.2030033333

Log Likelihood                       350.5342

Durbin-Watson Statistic                2.0144

Q(36-1)                               31.9984

Significance Level of Q             0.6137834

 

    Variable                        Coeff      Std Error      T-Stat      Signif

************************************************************************************

1.  Constant                      0.000000000  0.000000000      0.00000  0.00000000

2.  TD(1)                        -0.004196510  0.003441630     -1.21934  0.22440511

3.  TD(2)                         0.004534309  0.003525496      1.28615  0.20014040

4.  TD(3)                         0.000622904  0.003499000      0.17802  0.85891649

5.  TD(4)                        -0.000390495  0.003545632     -0.11013  0.91243294

6.  TD(5)                         0.008686343  0.003507793      2.47630  0.01425400

7.  TD(6)                         0.010143736  0.003567492      2.84338  0.00501134

8.  LPYR                          0.032035734  0.011859245      2.70133  0.00760588

9.  SMA{12}                      -0.579720194  0.067426445     -8.59782  0.00000000

 

 


Graphs

 

 

 

 


Copyright © 2025 Thomas A. Doan