Statistics and Algorithms / Structural Breaks and Switching Models /

Threshold Autoregressions

A standard autoregression (or, more generally, an ARMA model) can, with a sufficiently large number of lags and the right parameters, generate cycles of any frequency. However, any such cycle is symmetric, in the sense that the period in “up” cycles must be identical to that in the “down” cycles. Due to randomness, the observed cycles aren’t, in practice, identical, but on average the lengths will be the same. However, it’s a stylized fact that in (U.S.) GDP, the down cycles are shorter than the up—recessions are shorter and steeper than expansions.

The threshold autoregression is an alternative model that can produce asymmetric cycles. Instead of a single autoregression, it uses two (or more) branches with some form of trigger which determines which of the two applies. If that trigger is based on a lag of the series itself, the threshold autoregression remains a self-contained description of the process. Not only will the branches have different coefficients, but they might even have completely different lag structures. The general form is:

\begin{equation} y_t = \left\{ {\begin{array}{*{20}c}{\phi _{11} y_{t - 1} + \ldots + \phi _{1p} y_{t - p} + u_t } & {{\rm{if}}\,z_{t - d} < c} \\{\phi _{21} y_{t - 1} + \ldots + \phi _{2q} y_{t - q} + u_t } & {{\rm{if}}\,z_{t - d} \ge c} \\ \end{array}} \right. \label{eq:breaks_tarmodel} \end{equation}

SETAR Model

SETAR stands for Self-Exciting Threshold AutoRegression. The Self-Exciting refers to the fact that the threshold variable (\(z\) in \eqref{eq:breaks_tarmodel}) is a lag of the dependent variable. \(c\) (and possibly \(d\)) are unknown. The estimates of \eqref{eq:breaks_tarmodel} can be done by least squares on the two branches separately, so the evaluation of the overall sum of squares is fairly quick for a given pair of \(d\) and \(c\). The problem is that the sum of squares isn’t a continuous function of either: \(d\) for the obvious reason that it’s an integer parameter, \(c\) because \({z_{t - d} }\) takes only a finite number of values in sample, and \eqref{eq:breaks_tarmodel} only changes at those values. As a result, a SETAR must be estimated using a grid search, over \(c\) given \(d\), then (if \(d\) is unknown as well) over values of \(d\). The grid in \(c\) will be over the observed values of \({z_{t - d} }\). This grid search generally ignores a percentage (10–15%) of the most extreme values on either end in order to ensure that all the branches can be estimated with a sufficient number of data points.

The RATS procedure for estimating a SETAR is called @TAR. This does the analysis as described in Bruce Hansen (1996). This estimates the optimal break, and includes an option for bootstrapping the significance level, since the maximal F-statistic has a non-standard distribution. The following estimates a SETAR on GGROWTH using only lags 1, 2 and 5, with the threshold chosen from those lags as well. It does 1000 bootstrap replications. (This is from the replication program for the empirical example in the Hansen paper).

@tar(laglist=||1,2,5||,nreps=1000) ggrowth

STAR Models

The sharp cutoff in the SETAR model is, in many cases, unrealistic, and the lack of continuity in the objective function causes other problems—you can’t use any asymptotic distribution theory for the estimates, and without changes, they aren’t appropriate for forecasting since it’s not clear how to handle simulated values which fall between the observed data values near \(c\).

An alternative is the STAR model (Smooth Transition AutoRegression). Instead of the sharp cutoff, this uses a smooth function of \({z_{t - d} }\):

\begin{equation} y_t = X_t \beta _{(1)} + X_t \beta _{(2)} G(Z_{t - d} ,\gamma ,c) + u_t \label{eq:breaks_starmodel} \end{equation}

The transition function is bounded between 0 and 1, and depends upon a location parameter \(c\) and a scale parameter \(\gamma\). (There are other, equivalent, ways of writing this. The form we use here lends itself more easily to testing for STAR effects, since it’s just least squares if the \(G\) function is pushed to zero).

The two standard transition functions are the logistic (LSTAR) and the exponential (ESTAR). The formulas for \(G(Z_{t - d} ,\gamma ,c)\) for these are

for LSTAR: \({1 - \left[ {1 + \exp \left( {\gamma (Z_{t - d} - c)} \right)} \right]^{ - 1} }\)

for ESTAR: \({1 - \exp \left( { - \gamma (Z_{t - d} - c)^2 } \right)}\)

LSTAR is more similar to the SETAR model: for values well to the left of \(c\), the \(G\) value is near zero, so the coefficient vector is \(\beta _{(1)} \), while for values well to the right, \(G\) is near one, so the coefficients are \(\beta _{(1)} + \beta _{(2)} \). As \(\gamma \to \infty \), LSTAR converges to a standard threshold model with a break at \(c\). As \(\gamma \to 0\), it converges to least squares.

ESTAR treats the tails symmetrically, with values near \(c\) having coefficients near \(\beta _{(1)} \), while those farther away (in either direction) being close to \(\beta _{(1)} + \beta _{(2)} \). ESTAR is often used when it is assumed that there are costs of adjustment in both directions.

STAR models, at least theoretically, can be estimated using non-linear least squares. This, however, requires a bit of finesse: under the default initial values of zero for all parameters used for NLLS, both the parameters in the transition function and the autoregressive coefficients that they control have zero derivatives. As a result, if you do NLLS with the default METHOD=GAUSS, it can never move the estimates away from zero. A better way to handle this is to split the parameter set into the transition parameters and the autoregressive parameters, and first estimate the autoregressions conditional on a pegged set of values for the transition parameters. If you’re not familiar with the use of multiple PARMSETS, see Combining PARMSETS.

\(\gamma\) depends upon the scale of the threshold variable, and it may be hard to think of reasonable guess values. For that reason, it’s helpful to replace \(\gamma\) with \(\gamma /\sigma \) in the formula, where \(\sigma\) is some estimate for the standard deviation for \(Z\). (Standardizing \(Z\) itself gives the identical behavior, but it’s generally better to keep the threshold series in its natural scale.) In that form \(\gamma\) will have a common meaning regardless of the scale of \(Z\). (For an ESTAR, use \(\gamma /\sigma ^2 \)).

The following shows these two ideas and is taken from TARMODELS.RPF. Lag 3 of X is used as the transition variable. This uses the (reciprocal) sample standard deviation of X to re-scale the exponent, and uses guess values of the sample mean for \(c\) and 2 for \(\gamma\). The first NLLS estimates the autoregressive coefficients only (those in the REGPARMS parmset), then the second estimates all the parameters. Note that the LSTAR transition function uses the %LOGISTIC function. If you use \eqref{eq:breaks_starmodel} directly, the exp in the denominator might overflow for large positive values of \(Z_{t - d} \). %LOGISTIC does the same calculation, but in a “safe” way. The ESTAR function doesn’t have this problem, since very large values will “underflow”, but that just gives the desired zero value for the exp.

stats x

compute scalef=1.0/sqrt(%variance)

nonlin(parmset=starparms) gamma c

frml glstar = %logistic(scalef*gamma*(x{3}-c),1.0)

compute c=%mean,gamma=2.0

frml star x = g=glstar,phi1f+g*phi2f

nonlin(parmset=regparms) phi1 phi2

nonlin(parmset=starparms) gamma c

nlls(parmset=regparms,frml=star) x

nlls(parmset=regparms+starparms,frml=star) x

Testing

There are LM tests for threshold effects which can be applied to a standard regression. The @THRESHTEST and @TSAYTEST procedures test for threshold effects given a specific threshold series. @THRESHTEST does this by doing a standard structural break test, but with the observations added in order of the threshold series. As the maximum F-statistic has a non-standard distribution, the procedure provides for bootstrapped p-values. @TSAYTEST does an “arranged autoregression” test, which uses recursive residuals for the model with sample ordered by the threshold variable. If those are correlated with the regressors, it is evidence of a threshold effect.

@STARTEST tests for general non-linearity which includes either LSTAR or ESTAR. As with the others, it requires a specific threshold series, so, in practice, it is usually applied with several choices for the delay parameter \(d\).

Forecasting/Impulse Responses

TAR models are multi-branch linear models tied together by a non-linear function. Because of the non-linearity, there really is no single “impulse response function.” For instance, the effect of a shock will be quite different when you are near a transition point than when you aren’t. Responses are also very sensitive to scale. In a purely linear model, the effect of doubling the shock is to double the response—that’s not true with a TAR model. What provides much the same information as a standard IRF is the eventual forecast function. This just calculates the point forecasts from some initial situation. How it looks will depend upon what that the initial situation is (as will be true with almost any non-linear model), but it should give some idea of what types of cycles the model can generate.

However, if you want to use the model for actual forecasts, the non-stochastic calculation of the eventual forecast function can be highly misleading. It maps out just one set of many possible sets of transitions. Instead, doing random simulations and averaging is the proper approach. Simulation methods in general are covered in Simulations and Bootstrapping and the simulations specifically for threshold models are described in detail as part of the Structural Breaks and Switching Models e-course.

Also requiring simulation is the calculation of a Non-Linear Impulse Response Function (sometimes known as a GIRF for Generalized Impulse Response Function). This is similar to the calculation of forecasts described above, but layers upon that the simulation of a random shock at period 0 (or sometimes a random positive or random negative shock if the two are likely to have quite different effects). Again, note that there is no single “impulse response function” in these models—the response to shocks changes with the initial conditions, and with the shock size and sign.

Identification Problems

Both “sharp” and smooth transition models have the property that, if there really aren’t two regimes, an entire set of parameters isn’t identified. For the TAR models, this isn’t a major problem—because the models are estimated by least squares given a test value of the break value, it will always be possible to get the estimates, it’s just that the difference between the fit with and without the breaks is relatively small. For the STAR models, this is much more serious. Because of the non-linear interaction between the threshold function and the model, there are several different ways to get rid of the unnecessary second branch: \(\beta _{(2)} \) can be zero, which makes \(\gamma\) and \(c\) unidentified, or \(c\) can be off the end of the data, making \(\beta _{(2)} \) unidentified. Either one is a problem for a non-linear estimation method.

The STAR model has another identification problem: if there is a threshold effect, but it’s sharp and not smooth, the optimal value of \(\gamma\) is infinite. And when \(\gamma\) is very large, the function is no longer differentiable with respect to \(c\) which is why you have to do a grid search for the sharp transition in the first place—this is more common than one would think. You might find that you need to shift to a TAR model instead.

Multivariate Models

Both TAR and STAR models have extensions to multiple observables. For the TAR model, this requires using SWEEP rather than LINREG to do the calculations, and for STAR, NLSYSTEM rather than NLLS. Multivariate TAR models (or threshold VAR as a special case) have a better developed literature and there are several paper replication programs for those. The identification problems for STAR models described above are much more of a problem for multivariate models.