Statistics and Algorithms / Structural Breaks and Switching Models /

A General Structure for Analyzing Breaks

This describes a general structure for searching for one or more breaks in a model at an unknown location. (A break at a known location is generally fairly simple).

Isolated outliers are typically not considered to be “breaks” in the sense considered here. However, if you’re not careful, some of these techniques can give a “false positive” for a break when all that is present is a simple outlier. With the break at an unknown location, one way to control that is to restrict the possible break points to a central range in the data, typically excluding the first and last 10–15% of the data set. By requiring a certain minimal amount of data to be included in each partition, you make it very hard for an outlier near either end to be misclassified as a break. The symbol \(\pi\) is commonly used to represent the excluded fraction of the data.

Single Breaks

The following is “pseudo-code” for searching for a single break with an excluded range on either end. As you can see, the controlling code is quite simple—most of the work is setting up and estimating the model with the break. This assumes that we’re looking for the smallest t-statistic, and that the (to-be-added) code is producing a value for the t-statistic with these settings as TSTAT. If you’re looking to maximize something, just change the < to > in the IF inside the loop. The following are assumed to have been created already:

PI                    (fractional) excluded zone at ends
LOWER          lowest possible entry
UPPER          highest possible entry

FIX(x) takes the real-valued results of multiplying the number of observations by PI and rounds down to the nearest integer. The resulting BSTART and BEND are the range where breaks are permitted.

compute nobs =upper-lower+1

compute bstart =lower+fix(pi*nobs)

compute bend =upper-fix(pi*nobs)

The minimum t value is initialized as an NA, so we know to keep the first value we see. We prefer this (with a check for %VALID in the test inside the loop) to setting a “high” value and just doing the size check, as there is no chance of starting with a value which isn’t high enough.

compute mint =%na

do time=bstart,bend

Do calculation here with breakpoint at TIME.

if .not.%valid(mint).or.tstat<mint

compute mint=tstat,bestbreak=time

end do time

At this point, the best value will be in MINT and the optimal break is in BESTBREAK.

Multiple Breaks

Needless to say, this is quite a bit more complicated than the single break. The typical programs that you’ll find for multiple breaks are written for the case of two breaks only. However, the following general structure will handle any number of breaks from one on up, with no additional programming.

With multiple breaks, you not only have to exclude breakpoints near the end of the data, but you must also make sure the breaks themselves don’t get too close together. This uses the same control parameter (PI) at the ends and in the middle, though that can easily be changed. In addition to PI, LOWER and UPPER as before, we also need:

BREAKS number of breaks

With BREAKS set to the appropriate value, you can do:

compute nobs =upper-lower+1

compute pinobs=fix(pi*nobs)

Set up the starting break points (the leftmost legal values) into BPS and the upper bounds (rightmost legal values) into UPPERBOUND.

local vect[int] bps(breaks) upperbound(breaks) bestbreaks(breaks)

do i=1,breaks

compute bps(i) = (lower-1) + pinobs*i

compute upperbound(i) = endl+1-pinobs*(breaks+1-i)

end do i

compute mint=%na

compute done=0

while .not.done {

Do calculation here with breakpoints at BPS(1),...,BPS(BREAKS).

if .not.%valid(mint).or.tstat<mint

compute mint=tstat,bestbreaks=bps

Update the break points. Add to the final slot, until we hit its upper bound. When we do, add one to the second to last slot, and reset the final one to its smallest possible value. When the second to last is exhausted, add one to the one earlier, etc. If DONE is zero at the end, we still have a valid combination that we haven’t checked.

compute done=1

do i=breaks,1,-1

compute bps(i) = bps(i) + 1

if bps(i)>=upperbound(i)

do j=i+1,breaks

compute bps(j) = bps(j-1) + pinobs

end do j

compute done=0

break

end do i

}

At this point, the best value will be in MINT and the optimal set of breaks is in BESTBREAKS.

Note that when the number of breaks is two or more, this can require a very large number of calculations, particularly when the data set is large. The number of combinations with \(k\) breaks goes up with the power of \(T^k \); since the calculation itself (given a combination) is also likely \(O(T)\), the calculations go up with \(T^{k+1}\). If the base model is a quickly computed linear regression, two breaks with 400 data points won’t take an unreasonable amount of time. Two breaks with 2000 data points will take roughly 125 times as long. If the base model is non-linear and requires iterated calculations, you might even find that two breaks with 400 data points takes too long.

For linear models estimated with least squares, the Bai-Perron algorithm reduces the computational effort with two or more breaks. However, that’s not available for any type of non-linear model.

The search procedures for breaks shown above are included in quite a few RATS procedures, such as @APBreakTest (for a single break in a linear regression) and @LSUnit (for multiple breaks in a unit root test). The ONEBREAK.RPF example shows it being used in a test procedure which can’t be handled by the existing procedures.