BAIPERRON Procedure

@BAIPERRON does Bai-Perron structural break analysis for a linear regression. This is designed to look for time breaks (particularly more than one break) in a linear regression model using the algorithm described in Bai and Perron(2003). A related procedure which uses the same basic algorithm to handle multiple breaks in a variable other than "time" is @MultipleBreaks. If you're only looking for a single break, there's no advantage in using the Bai-Perron algorithm. Instead, you can use @APBREAKTEST, which has more features.

@BaiPerron( options ) depvar start end

# list of regressors (in regression format)

Parameters

depvar	dependent variable
start, end	range for regression. By default, the maximum range permitted by all variables involved in the regression.

Supplementary Card

The fixed regressors (if any) must be listed first.

Options

MINSPAN=shortest distance between two breaks [number of regressors]

MAXBREAKS=maximum breaks allowed [2]

NFIX=number of regressors which are fixed over time[0]

ITERS=maximum number of iterations[20]. (Matters only if NFIX>0)

[PRINT]/NOPRINT

Controls printing of the final regression and breakpoints

TESTS/[NOTESTS]

Controls printing of the table of BIC, LWZ and F tests for choosing the number of breaks.

Variables Defined

%BETA	VECTOR of stacked coefficients for MAXBREAKS+1 regimes
%NOBS	number of observations (INTEGER)
%NREG	number of regressors (using MAXBREAKS+1 regimes)
%RSS	sum of squared residuals
%%BREAKPOINTS	VECTOR[INT] with the break point entries (for MAXBREAKS break points)

Bai-Perron is an algorithm for efficiently finding the least squares break points in a linear regression model of the form \({y_t} = {X_t}\beta + {u_t}\). From a computational standpoint, the key result here is that the level of effort of calculation can be reduced substantially when there are two or more breaks. This is (in general) specific to the case of the linear model. In the general framework for analyzing (multiple) breaks, for a fixed value \({T_1}\) for the first breakpoint, there are a large number of partitions for the range \([{T_1} + 1, \ldots ,T]\). For a linear model with complete coefficient breaks, there is no need to re-estimate the regression over \([1, \ldots ,{T_1}]\) for every choice for breaking the later sample; the coefficients and sums of squared residuals aren’t affected by how those later breaks are done. The sums of squared residuals can be computed just once for each legal subset of the range, after which the partitions can be examined efficiently without redoing any of the actual estimations.

Note that this is just a way of doing the calculations efficiently. It applies regardless of the behavior of the variables—if you have a linear regression, are estimating by least squares, and want to find the break points which minimize the sum of squared residuals, you can use @BaiPerron to do it. Note, however, that while the estimates are correct even if the data are non-stationary, the tabled critical values for the number of breaks don’t apply for non-stationary data.

The basic idea can also be applied if the threshold variable isn’t “time”. The procedure @MultipleBreaks is similar to @BaiPerron, but uses an input variable to partition the sample. And the same idea (though not the @BaiPerron procedure itself) could be applied to, for instance, a simple non-linear least squares model with a closed-form function. However, ARMA and GARCH models (as examples) have functions which are generated recursively from the start of the data, so they wouldn’t benefit from this.

The main point of these multiple break point tests is to check a specification of a fixed coefficient model. If the change point analysis shows a break in the specification, it is highly unlikely that you would respond to that information by replacing your original model with the same specification estimated in two subsamples. Instead, you would look for a change in specification, maybe finding a different proxy, or transformation of a variable, or addition of some regressor which would somehow model the break.

You have to be a bit careful when using this procedure. In particular, you need to set the MINSPAN and MAXBREAKS options with care. The default for MINSPAN is the number of regressors. If you are just using a single regressor (a CONSTANT for example), then one of the partitions can be a single point. Under those circumstances, it is quite easy for the BIC to keep going down rather steadily as you increase the number of breaks. Increasing MINSPAN will tend to force it to find only those situations would seem to fit with a more conventional notion of a structural break rather picking out outliers.

With Fixed (Non-Breaking) Regressors

The dynamic programming algorithm described in Bai and Perron is guaranteed to find the partition of the sample which minimizes the sum of squared residuals when the coefficients are allowed to break completely. The procedure also allows for a certain number of coefficients to take a single value in all partitions. This is controlled by the NFIX option. That is much more complicated: because all the estimates are linked, the sum of squared residuals over one subsample will depend upon which other subsamples are used. Write the model as:

\begin{equation} {y_t} = {X_t}\beta + {Z_t}{\delta _{(i)}} + {u_t}\,\,{\rm{if}}\,\,t \in {P_{(i)}} \label{eq:BaiPerron_fixed} \end{equation}

The algorithm used in the NFIX case fixes the common coefficients (\(\beta\), applies the full-break algorithm to \(y_t - X_t \beta \) to get new partitions, re-estimates \eqref{eq:BaiPerron_fixed}, and repeats until convergence. While this takes quite a bit less time than applying the general structure for breaks, it doesn’t guarantee that the best partition for the model is located.

If you’re interested in “broken trends”, note that Bai-Perron can handle only certain types of those—either a complete break with both intercept and trend rate changing, or (using the NFIX option) a fixed trend rate, but with breaks in the intercept. It can’t handle a spline situation where the trend rate changes but the function value doesn’t.

Example

* Replication file for Bai & Perron(2003), "Computation and analysis of

* multiple structural change models," Journal of Applied Econometrics,

* vol. 18, no. 1, pages 1-22.

* UK data (inflation and Phillips curve)

open data uk.dat

calendar(a) 1855

data(format=free,org=columns) 1855:01 1987:01 year lprice lwage ur

set dprice = lprice-lprice{1}

set dwage = lwage-lwage{1}

set dur = ur-ur{1}

* AR(1) for inflation

@BaiPerron(maxbreaks=2,minspan=8,print,tests) dprice 1948:1 1987:1

# constant dprice{1}

* Phillips Curve. The two unemployment series are fixed regressors

@BaiPerron(maxbreaks=2,minspan=4,nfix=2,print,tests) dwage 1948:1 1987:1

# dur ur{1} constant dprice{1}

Output

Linear Regression - Estimation by Least Squares

Dependent Variable DPRICE

Annual Data From 1948:01 To 1987:01

Usable Observations 40

Degrees of Freedom 38

Centered R^2 0.6376424

R-Bar^2 0.6281066

Uncentered R^2 0.8726985

Mean of Dependent Variable 0.0625150000

Std Error of Dependent Variable 0.0465921780

Standard Error of Estimate 0.0284133361

Sum of Squared Residuals 0.0306780714

Regression F(1,38) 66.8688

Significance Level of F 0.0000000

Log Likelihood 86.7042

Durbin-Watson Statistic 1.7770

Variable Coeff Std Error T-Stat Signif

************************************************************************************

1. Constant 0.0116240264 0.0076755387 1.51442 0.13819241

2. DPRICE{1} 0.8027600533 0.0981689363 8.17733 0.00000000

Breaks RSS BIC LWZ F(m) F(m|m-1)

0 0.0306781 -6.99 -6.89*

1 0.0267186 -6.94 -6.74 2.67 2.67

2 0.0183782 -7.13* -6.83 5.69 7.71

Linear Regression - Estimation by Bai-Perron Break Analysis

Dependent Variable DPRICE

Annual Data From 1948:01 To 1987:01

Usable Observations 40

Degrees of Freedom 34

Centered R^2 0.7829241

R-Bar^2 0.7510012

Uncentered R^2 0.9237381

Mean of Dependent Variable 0.0625150000

Std Error of Dependent Variable 0.0465921780

Standard Error of Estimate 0.0232493953

Sum of Squared Residuals 0.0183781689

Regression F(5,34) 24.5254

Significance Level of F 0.0000000

Log Likelihood 96.9519

Durbin-Watson Statistic 1.9892

Variable Coeff Std Error T-Stat Signif

************************************************************************************

1. DZ(1,1) 0.024501073 0.011175703 2.19235 0.03529693

2. DZ(2,1) 0.274012470 0.269892429 1.01527 0.31714998

3. DZ(1,2) -0.000775030 0.017853204 -0.04341 0.96562766

4. DZ(2,2) 1.343368583 0.224520585 5.98328 0.00000091

5. DZ(1,3) 0.017603218 0.015007032 1.17300 0.24894928

6. DZ(2,3) 0.683409843 0.130106083 5.25271 0.00000807

Bai-Perron Break Point Analysis

Dependent Variable DPRICE

Shifting Regressors

<=1967:01 <=1975:01

Constant 0.024501 -0.000775 0.017603

DPRICE{1} 0.274012 1.343369 0.683410

Breakpoint Lower 95% Upper 95%

1967:01 1965:01 1968:01

1975:01 1973:01 1979:01

Linear Regression - Estimation by Least Squares

Dependent Variable DWAGE

Annual Data From 1948:01 To 1987:01

Usable Observations 40

Degrees of Freedom 36

Centered R^2 0.5536934

R-Bar^2 0.5165011

Uncentered R^2 0.9100533

Mean of Dependent Variable 0.0869750000

Std Error of Dependent Variable 0.0442527574

Standard Error of Estimate 0.0307707486

Sum of Squared Residuals 0.0340862028

Regression F(3,36) 14.8873

Significance Level of F 0.0000018

Log Likelihood 84.5973

Durbin-Watson Statistic 1.4279

Variable Coeff Std Error T-Stat Signif

************************************************************************************

1. DUR -0.845720180 0.846453794 -0.99913 0.32439887

2. UR{1} 0.014732745 0.145091000 0.10154 0.91968415

3. Constant 0.038694932 0.009616670 4.02373 0.00028129

4. DPRICE{1} 0.784822091 0.139594631 5.62215 0.00000222

Breaks RSS BIC LWZ F(m) F(m|m-1)

0 0.034086 -6.70 -6.50

1 0.020387 -7.03 -6.72 11.42 11.42

2 0.013071 -7.29* -6.88* 12.86 25.73

Linear Regression - Estimation by Bai-Perron Break Analysis

Dependent Variable DWAGE

Annual Data From 1948:01 To 1987:01

Usable Observations 40

Degrees of Freedom 32

Centered R^2 0.8288610

R-Bar^2 0.7914243

Uncentered R^2 0.9655094

Mean of Dependent Variable 0.0869750000

Std Error of Dependent Variable 0.0442527574

Standard Error of Estimate 0.0202102727

Sum of Squared Residuals 0.0130705639

Regression F(7,32) 22.1404

Significance Level of F 0.0000000

Log Likelihood 103.7679

Durbin-Watson Statistic 2.1394

Variable Coeff Std Error T-Stat Signif

************************************************************************************

1. DUR -0.144080727 0.582175709 -0.24749 0.80611270

2. UR{1} -0.875155847 0.372739545 -2.34790 0.02522521

3. DZ(1,1) 0.065742784 0.011691493 5.62313 0.00000324

4. DZ(2,1) 0.093727589 0.240525394 0.38968 0.69935578

5. DZ(1,2) 0.062313366 0.018826679 3.30984 0.00231801

6. DZ(2,2) 1.231430083 0.204979727 6.00757 0.00000106

7. DZ(1,3) 0.180925019 0.053881657 3.35782 0.00204033

8. DZ(2,3) 0.016178258 0.256669025 0.06303 0.95013343

Bai-Perron Break Point Analysis

Dependent Variable DWAGE

Fixed Regressors

DUR -0.144081

UR{1} -0.875156

Shifting Regressors

<=1967:01 <=1975:01

Constant 0.065743 0.062313 0.180925

DPRICE{1} 0.093728 1.231430 0.016178

Breakpoint Lower 95% Upper 95%

1967:01 1966:01 1969:01

1975:01 1974:01 1976:01