BOXJENK Instruction

Home Page

← Previous Next →

BOXJENK( options ) depvar start end residuals

# series numeratorlags denominatorlags delay

(if using the INPUTS option: one supplementary card for each input in a transfer function or intervention model)

BOXJENK( options ) depvar start end residuals

# additional explanatory variables in Regression Format

(if using the REGRESSORS or GLS options)

BOXJENK estimates ARIMA, seasonal ARIMA, transfer function and intervention models.

Wizard

ARIMA estimation is available by using the Time Series>Box-Jenkins (ARIMA) Models Wizard.

Parameters

depvar

dependent variable

start, end

Estimation range. If you use these to set the range (rather than using the default range), and are not using the maximum likelihood method, you must set start to allow for:

•the autoregressive and seasonal autoregressive lags in the dependent variable.

•the required lags for input variables. The total number of lags of actual data required is the sum of the highest AR lag, the highest seasonal AR lag and the highest lag in the input numerator.

You do not have to allow for lags in the moving average part or any denominator lags in the inputs.

If you haven’t set a SMPL, the range defaults to the maximum range over which all of these lags are defined.

resid

(Optional) The residuals are automatically saved in the series %RESIDS. You can supply a series name for this parameter if you also want to store the residuals in a different series.

Options

Standard Regression Options

Standard Non-Linear Estimation Options

SPAN=seasonal span [CALENDAR seasonal]

AR=number of autoregressive parameters [0]

MA=number of moving average parameters [0]

SAR=number of seasonal autoregressive parameters [0]

SMA=number of seasonal moving average parameters [0]

DIFFS=number of regular differencings [0]

SDIFFS=number of seasonal differencings [0]

These jointly specify the ARMA part of the model. See “Technical Information” for details on model parameterization. For frequencies defined in terms of the number of periods per year, SPAN defaults to the CALENDAR seasonal (for example, SPAN=12 for monthly data). For frequencies like weekly and daily, where there is no clear definition of a span, SPAN defaults to 1.

BOXJENK supports any combination of lags for the AR, MA, SAR, and SMA options:

•For N consecutive lags (all lags from 1 through N) for a given parameter, use the format AR=N, MA=N, etc.

•For non-consecutive lags, use ||list of lags||. If you are listing more than one lag, separate them by commas. For example, use AR=||3|| for an AR parameter at lag 3 only, while AR=||1,3|| gives parameters on lags 1 and 3. You can also use a VECTOR of INTEGERs.

CONSTANT/[NOCONSTANT]

DEMEAN/[NODEMEAN]

CONSTANT includes an intercept (constant) term in the model as an estimated parameter (by default, there is none). DEMEAN offers an alternative for handling series with non-zero means. It removes the mean from the dependent variable (after differencing, if required), prior to estimating the model. The mean removal is done using an internal copy—the actual series itself is not affected. See "Selecting and Estimating Models" for more on how the constant is handled by BOXJENK.

INPUTS=number of inputs for transfer function/intervention [0]

APPLYDIFFERENCES/[NOAPPLYDIFFERENCES]

Inputs are the number of transfer function inputs or intervention model dummy variable series. For each input, you include a supplementary card describing the polynomial associated with it. See "Supplementary Cards". If you use APPLYDIFFERENCES, RATS applies the differencing operators (DIFFS and SDIFFS options) to all the inputs. Although APPLY is not the default, you will usually use it with your inputs.

METHOD=[GAUSS]/BFGS/SIMPLEX/GENETIC/ANNEALING/GA/INITIAL/EVALUATE

ITERATIONS=iteration limit[100]

SUBITERATIONS=subiteration limit [30]

CVCRIT=convergence limit [.00001]

TRACE/[NOTRACE]

BOXJENK uses non-linear optimization. METHOD sets the estimation method to be used, with Gauss-Newton being the default choice. INITIAL is the initial guess algorithm—the same one used by the RATS instruction INITIAL. EVALUATE simply evaluates the model given the initial parameter values (which you can input using the INITIAL option).

ITERATIONS sets the maximum number of iterations, SUBITERS sets the maximum number of subiterations, CVCRIT the convergence criterion. TRACE prints the intermediate results.

PMETHOD=GAUSS/BFGS/SIMPLEX/GENETIC/ANNEALING/GA/INITIAL

PITERS=number of PMETHOD iterations to perform [none]

Use PMETHOD and PITERS if you want to use a preliminary estimation method to refine your initial parameter values before switching to one of the other estimation methods—BOXJENK will automatically switch to the "METHOD" choice after completing the "preliminary" iterations requested using PMETHOD and PITERS.

SMPL=Standard SMPL option[unused]

You can supply a series or a formula that can be evaluated across entry numbers. Entries for which the series or formula is zero or “false” will be skipped, while entries that are non-zero or “true” will be included in the operation. You must use the MAXL option with this.

MAXL/[NOMAXL]

Estimates the model using maximum likelihood estimation rather than conditional least squares. MAXL has the advantage of being able to handle missing values within the estimation range.

INITIAL=vector of initial guesses[unused]

The initial guess values used by RATS are usually adequate for well-specified models. However, if you are having trouble getting convergence, you can input your own guess values. See "Coefficient Order" for the proper placement. The option INITIAL=%BETA will start iteration from the point at which the previous BOXJENK left off, if you decide you simply want to let the process run for a few more iterations.

DEFINE=equation to define from result

If you intend to use the model for forecasting, you must use DEFINE to save the estimated equation. (The form used by BOXJENK is a structural model which isn't directly usable—the coefficients that apply to actual data are a non-linear function of the estimated parameters). RATS obtains the equation from the model by multiplying out the autoregressive and input denominator polynomials. For instance, RATS converts the model:

\({y_t} = \frac{{10}}{{1 - .3L}}{x_t} + \frac{{1 + .4L}}{{1 - .5L}}{u_t}\)

into the equation

\(\left( {1 - .3L} \right)\left( {1 - .5L} \right)\,{y_t} = 10\,\left( {1 - .5L} \right)\,{x_t} + \left( {1 - .3L} \right)\left( {1 + .4L} \right)\,{u_t}\)

\({y_t} = \;.8{y_{t - 1}} - \;.15{y_{t - 2}} + 10{x_t} - 5{x_{t - 1}} + {u_t} + \;.1{u_{t - 1}} - \;.12{u_{t - 2}}\)

If you transformed your dependent variable or any input series with differencing operators prior to doing BOXJENK, or if they were residuals themselves from BOXJENK (prewhitened), then you must use MODIFY and VREPLACE on the equation to put it into a form directly usable by the forecasting instructions.

DERIVES=VECTOR[SERIES] for partial derivatives [unused]

This saves the series of partial derivatives of the residuals. The first series in the VECTOR will be the partials with respect to the first parameter displayed in the BOXJENK output, the second series will be the partials with respect to the second parameter, and so on.

SV=input variance of residuals [variance from standard calculations]

This can be used (combined with INITIAL and METHOD=EVALUATE) to get the log likelihood of an ARIMA model at a specific value of the residual variance. This can be used as a step in a Gibbs sampling scheme. The point estimates of the coefficients themselves aren't affected by the residual variance, so this changes only the log likelihood from what you would get by using the standard concentrated variance estimate. This was added with version 9.1.

Options for "RegARIMA" Models

REGRESSORS/[NOREGRESSORS]

GLS/[NOGLS]

Use REGRESSORS or GLS if you want to include additional non-ARIMA variables using standard regression format, rather than the intervention/transfer function style inputs available using the INPUTS option. While similar, the two have a different emphasis. With GLS, it’s the mean equation represented by the explanatory variables which is the focus of the estimation; while with REGRESSORS, it's the ARIMA model itself. With GLS, the output is switched around so the explanatory variables are listed first. GLS forces use of maximum likelihood (i.e. same as the MAXL option) and also includes the behavior of the APPLYDIFFERENCES option.

MEANEQ=equation to define from the mean model only [unused]

This saves the equation created by the regressors only.

OUTLIER=[NONE]/AO/LS/TC/STANDARD/ALL

CRITICAL=critical (t-statistic) value [based on # of observations]

With any of the choices for OUTLIER other than NONE, BOXJENK does an automatic procedure for detecting and removing outliers. This can be used with or without the GLS option. If used without GLS, it operates like GLS with an empty set of base regressors, that is, it estimates dummy shifts to the mean of the dependent variable, using maximum likelihood. CRITICAL allows you to set the t-statistic value used for the automatic outlier detection threshold. See "Outliers" for more details.

ADJUST=series of RegARIMA adjustments [not used]

This is a series in which is saved the combined effects on the mean of all the regression coefficients, including input regressors and outliers, leaving out only the CONSTANT (if it’s included in your original set of regressors). This (or a transformation of it) can be input into the X11 instruction as a set of preliminary adjustment factors.

Supplementary Cards

If using the REGRESSORS option, supply any additional regressors in regression format.

If using the INPUTS option, supply one supplementary card for each input. The lag polynomial form of an input is:

\(\frac{{\left( {{\omega _0} + {\omega _1}L + \ldots + {\omega _n}{L^n}} \right)}}{{\left( {1 - {\delta _1}L - \ldots - {\delta _m}{L^m}} \right)}}{X_{t - d}}\)

Please note the sign convention (+ in numerator, - in denominator).

series	The input series. For an intervention model, this will be some type of dummy variable.
numlags	The number of numerator lags: n in the formula above. You can also use \|\|list of lags\|\| or a VECTOR of INTEGERS for non–consecutive lags. Note, however, that there will always be an \(\omega _0\) parameter in the model.
denlags	The number of denominator lags: m in the formula above. Again, you can use \|\|list of lags\|\|.
delay	The delay period for the series: d in the formula above. You can omit this if it is 0.

Variables Defined

Regression Variables

%CONVERGED	1 or 0. 1 indicates that the process converged. (INTEGER)
%CVCRIT	final convergence criterion (REAL). This will be equal to zero if the sub-iteration limit was reached on the last iteration.
%FUNCVAL	final value of the function being maximized. (REAL)
%ITERS	iterations completed (INTEGER)
%NARMA	total number of AR and MA terms in the equation (INTEGER)
%NFREE	number of free parameters, including the variance (if not input using SV) and the mean if extracted with DEMEAN (INTEGER)

Examples

boxjenk(diffs=1,ar=2,ma=2,constant) fygn3

Estimates an ARIMA(2,1,2) model.

boxjenk(ma=4,constant,maxl) caemp 1962:1 1993:4

Estimates an MA(4) model by maximum likelihood. (The default estimation procedure is conditional least squares)

boxjenk(ar=1,ma=||12||,noconst) y1

Estimates an ARMA(1,1) where the MA is on lag 12 only. There is no constant in the model.

boxjenk(inputs=1,noconst) slitaly 1971:3 *

# attkit 1 1 1

Estimates a transfer function model with a (1,1,1) (1 numerator, 1 denominator, 1 period delay) polynomial on the ATTKIT variable.

boxjenk(sdiffs=1,ar=1,ma=1,sma=1,regressors,apply,maxl) deaths

# seatbelt

Estimates an ARIMA(1,0,1)x(0,1,1) model with a shift dummy for the SEATBELT variable.

boxjenk(diffs=1,sdiffs=1,ma=1,sma=1,outliers=ao) airline

Estimates an ARIMA(0,1,1)x(0,1,1) (the so-called "airline" model) with automatic detection of additive outliers.

Sample Output

This is the output from the DEATHS and SEATBELT example. Note that the Q statistic reads Q(27-3). The 27 is the number of autocorrelations uses, and the 3 is the correction for estimated ARMA parameter. This does not include CONSTANT or any other regressors (such as SEATBELT). The ARMA parameters are shown as AR(lag), MA(lag), SAR(lag) and SMA(lag) where the last two are for the seasonal polynomial.

Box-Jenkins - Estimation by ML Gauss-Newton

Convergence in 16 Iterations. Final criterion was 0.0000028 <= 0.0000100

Dependent Variable DEATHS, differenced 0 Regular/1 Seasonal

Monthly Data From 1976:01 To 1984:12

Usable Observations 108

Degrees of Freedom 104

Centered R^2 0.7890975

R-Bar^2 0.7830138

Uncentered R^2 0.9945400

Mean of Dependent Variable 1559.6018519

Std Error of Dependent Variable 255.4379761

Standard Error of Estimate 118.9875597

Sum of Squared Residuals 1472436.0935

Log Likelihood -670.9869

Durbin-Watson Statistic 1.9629

Q(27-3) 14.1060

Significance Level of Q 0.9442196

Variable Coeff Std Error T-Stat Signif

************************************************************************************

1. AR{1} 0.9019352 0.1492277 6.04402 0.00000002

2. MA{1} -0.8186923 0.1887829 -4.33669 0.00003355

3. SMA{12} -0.6817955 0.0920913 -7.40347 0.00000000

4. SEATBELT -306.1953351 41.3739486 -7.40068 0.00000000

Missing Values

BOXJENK requires use of the MAXL option if there are any missing values within the data range.

Technical Information

The parameterization used for the ARIMA model in BOXJENK is:

\({\left( {1 - L} \right)^d}{\left( {1 - {L^s}} \right)^e}{y_t} = \alpha + \frac{{\left( {1 + {\theta _1}L\, + \, \ldots \, + \,{\theta _q}{L^q}} \right)\left( {1 + {\Theta _1}{L^s} + \, \ldots \, + {\Theta _l}^{sl}} \right)}}{{\left( {1 - {\phi _1}L\, - \, \ldots \, - \,{\phi _p}{L^p}} \right)\left( {1 - {\Phi _1}{L^s} - \, \ldots \, - {\Phi _m}{L^{sm}}} \right)}}{u_t}\)

where:

\({y_t}\)	is the dependent variable
\({u_t}\)	is the series of residuals
p	is the number of autoregressive coefficients
\({{\phi _n}}\)	is the AR coefficient at lag n
q	is the number of moving average components
\({{\theta _n}}\)	is the MA coefficient at lag n
\(\alpha \)	is the optional constant
\({{L^n}}\)	the lag (or backshift) operator at lag length n
d	is the number of differences
s	is the seasonal span
e	is the number of seasonal differences
l	is the number of seasonal MA components
\({{\Theta _n}}\)	is the seasonal MA coefficient at lag n
m	is the number of seasonal AR components
\({{\Phi _n}}\)	is the seasonal AR coefficient at lag n

Note the sign convention on the coefficients. Also, note that the parameterization of the constant term is different from that used in some other software.

If you are using the INPUTS option, polynomial terms of the form shown in “Supplementary Cards” are added to the right hand side of the above expression (one polynomial per input).

Algorithm

By default, BOXJENK uses the Gauss-Newton algorithm with numerical derivatives. The simplex, genetic, annealing and genetic annealing methods are also available with the METHOD and PMETHOD options. Simplex, in particular, can be helpful in improving initial parameter guesses for models that prove difficult to fit. Maximum likelihood estimation is done using a state-space representation. Ansley’s (1979) conversion of the full-sample likelihood into a least-squares problem is employed if you use Gauss-Newton. METHOD=BFGS, though, often provides better estimation performance.

For transfer function models, BOXJENK generates

\({Z_t} = \frac{{{X_t}}}{{\partial \left( L \right)}}\)

(when denominator lags are present) by solving \(\partial \left( L \right){Z_t} = {X_t}\) , where presample values of X are set equal to the mean of the first twenty observations.

Outliers

The OUTLIER and related options implement an automatic procedure for dealing with outliers in the data.

The choices for OUTLIER are NONE, A0, LS, TC, STANDARD, and ALL. AO locates additive outliers. For an outlier at entry t0, the resulting dummy would be 1 only at t0. LS detects level shifts, generating a dummy with -1’s from the beginning of the sample until t0 - 1. TC detects temporary changes. For a temporary change starting at t0, the dummy takes the value 1 at t0, then declines exponentially for data points beyond that. OUTLIER=AO, OUTLIER=LS and OUTLIER=TC select scans for only the indicated type of outlier. OUTLIER=STANDARD scans for AO and LS, OUTLIER=ALL does all three.

The following procedure is repeated until no further outliers are detected:

1.Beginning with the last RegARIMA model (including previously accepted outliers), LM tests are performed for each of the requested types of outliers at all data points. If the largest t-stat exceeds the critical value, that shift dummy is added to the model, which is then re-estimated.

2.When there are no further outliers to be added, the list is then pruned by examining the t-stats from the full estimation using the same critical value.

Note that the first step uses a “robust” estimate of the standard error of the residuals, based upon the median absolute value. There are several ways to compute maximum likelihood estimates; BOXJENK uses Kalman filtering, Census X13–ARIMA-SEATS uses optimal backcasting. The two lead to identical values for the likelihood function, identical values for the sum of squares of the residuals, but not to identical sets of estimated residuals. As a result, there can be slight differences between this robust estimate of the standard error. In some cases, they can be large enough to cause the two programs to differ on whether a marginal t-stat is above or below the limit. (X13–ARIMA-SEATS tends to give a lower value for the standard error, and hence higher t-statistics). This tends to correct itself in the backwards pruning steps.

Coefficient Order

For a standard ARIMA/Transfer function model, the coefficients are in the model in the following order. This is used in the INITIAL option or for doing hypothesis tests.

1.Constant

2.Autoregressive

3.Seasonal autoregressive

4.Moving average

5.Seasonal moving average

6.Inputs in order, numerator first

For RegARIMA models, the regressors are considered to be of principal interest, so they are moved to the start:

1.Regressors in order

2.Autoregressive

3.Seasonal autoregressive

4.Moving average

5.Seasonal moving average

Hypothesis Tests

You can use TEST and RESTRICT to test hypotheses on the coefficients. The coefficient order is provided just above. It's probably easiest to use the Statistics>Regression Tests Wizard to set those up.