MSREGRESSION Procedures

@MSRegression sets up a Markov switching linear regression, pulling in the required set of procedures for doing analysis. The regression model itself takes the form:

${y_t} = {Z_t}\gamma + {X_t}\beta ({S_t}) + {u_t};{u_t} \sim N\left( {0,{\sigma ^2}({S_t})} \right)$

where ${{S_t}}$ is the regime. This allows for certain coefficients (on the $Z$ variables) to be fixed among regimes, some (on the $X$ variables) to vary and also allows the variance to vary with the coefficients. The regime ${{S_t}}$ is governed by a Markov switching process.

Note that @MSRegression does not, by itself, do any of the estimation. It merely sets up the variables required.

@MSRegression( options ) depvar

# list of regressors

Parameters

depvar

dependent variable

Options

REGIMES=number of regimes[2]

Sets the number of regimes. (The older STATES option can be used instead, though REGIMES is preferred).

SWITCH=[C]/CH/H

This determines what switches among regimes. With SWITCH=C, coefficients switch, but the error variance is the same. With SWITCH=CH, both the coefficients and the variances switch (together). With SWITCH=H, the all coefficients are fixed and variances switch.

EQUATION=EQUATION which describes the regression [not used]

You can use this rather than the supplementary card to input the form of the regression.

NFIX=number of fixed coefficients [0]

If the coefficients switch among regimes, you can use NFIX to allow a certain number of them to be fixed instead. The regressor list needs to be arranged to have the fixed coefficients first, that is, in the expression above, you list the Z's before the X's.

Variables Defined

Everything defined by @MSSetup is also defined by @MSRegression, which includes it. These are the variables defined specifically by @MSRegression, for use in parameter sets for estimation. Not all of them will be in active use in a particular model. In particular, only one of SIGSQ and SIGSQV will be used in a model; the first for a fixed variance, the second for regime-switching variances.

BETAS	VECT[VECT] of the coefficients which switch among regimes. BETAS(s) is the VECTOR of (switching) coefficients in regime s. BETAS(s)(i) is the ith switching coefficient in regime s.
GAMMAS	VECTOR of the fixed coefficients (will be zero dimension if there are none).
SIGSQ	equation variance if variance is homogeneous (REAL)
SIGSQV	VECTOR of regime-specific equation variances if the variance is switching.

Examples (of procedure itself)

@MSRegression(switch=c,nfix=2,regimes=2) loggdp

# trend loggdp{1} constant

Two regime model, with fixed coefficients on TREND and lagged LOGGDP, switching coefficient on CONSTANT and fixed variance.

linreg(define=fulleqn) ggrowth

# ggrowth{1 to 4} constant

@msregression(regimes=2,equation=fulleqn,switch=ch) ggrowth

Two regime model, with switching for all coefficients plus the variance in a four lag autoregression.

Working Procedures and Functions

@MSRegInitial(guessregimes=SERIES[INTEGER])start end

computes a "standard" set of initial guess values for the parameters. The standard guesses are to spread the variances (if variances are allowed by switch) from low (regime 1) to high and to spread the first switching coefficient (also from low to high) using information from a single linear regression over the indicated range. The variances, for instance, are scales of the regression variance; the coefficients are spread from the estimated value based upon the standard error of the coefficients. All the other regression coefficients are guessed at the same across regimes. That may not be what you would expect to see as the differences among regimes. If not, you'll either have to adjust some parameters, or you can input a SERIES[INTEGER] with "guesses" for which entries are in which regimes, and the guess values will be generated based upon regressions over the subsamples. (See the example below).

If you leave off the start and/or end parameters, the maximum available range given the series and lags involved will be used.

@MSRegParmset(parmset=PARMSET to define)

defines a PARMSET with the free parameters for the switching regression model (all except the transition probability parameters).

%MSRegFVec(time)

function which returns the vector of likelihoods (not in log form) for the regimes at entry time.

%MSRegProb(time)

returns the the likelihood (not logged) of the model at entry time. As a side effect, it computes pt_t1(time) and pt_t(time).

%MSRegInit()

does the calculations needed at the start of each function evaluation

%MSRegInitTransition()

for a model with fixed transitions, expands the transition matrix if required and scans for negative probabilities. Returns 1 if the probabilities are all non-negative, and 0 if not.

@MSRegSmoothed gstart gend psmoothed

calculates the smoothed (marginal) probabilities of the states, that is psmoothed(t) is the smoothed probability of being at the various states at time t.

@MSRegResids(regime=SERIES[INT] of regimes or specific=specific regime)resids start end

computes the equation residuals for specific regimes. Note that these are not useful for diagnostics—they're only used as part of larger calculations. Use @MSRegStdResiduals for diagnostics.

@MSRegStdResiduals resids start end

computes a series of standardized one-step residuals, with regimes weighted by the one-step predictive probabilities. Unlike @MSRegResids, these are useful for diagnostics, as they should be serially uncorrelated, mean zero and variance one if the model is correct. (They are not, however, Normally distributed even if the assumption of underlying Normality is correct).

@MSRegSetEquation(regime=which regime,define=equation to define)

defines an equation (coefficients and variance) for a particular regime.

For EM estimation

@MSRegEMGeneralSetup

needs to be called before using EM to set up the work arrays

@MSRegEMStep gstart gend

@MSRegEStep(options) gstart gend

@MSRegMStep(options) gstart gend

@MSRegEMStep does the combined E and M steps, @MSRegEStep does only the E, and @MSRegMStep does only the M. @MSRegEMStep and @MSRepEMStep both have options to restrict the set of parameters that are updated, so some can be fixed.

For MCMC analysis

@MSRegRelabel swaps

reorders all switching components based upon the index array swaps.

Example (Maximum Likelihood Estimation)

The commands required for actually doing the estimation are generally quite simple. The log likelihood can be computed using a FRML such as

frml logl = f=%MSRegFVec(t),fpt=%MSProb(t,f),log(fpt)

%MSRegFVec is a function pulled in with @MSRegression. It returns the regime by regime log likelihood at entry t. %MSProb is a standard Markov switching update calculation—it's part of the @MSSetup procedures (which are pulled in automatically by @MSRegression). It updates the filtered probabilities of the regimes and returns the likelihood (not logged) of observation t.

The estimation can proceed with something like (this is for the second example above):

nonlin(parmset=msparms) theta

@MSRegParmset(parmset=regparms)

compute p=%MSLogisticP(theta)

@MSFilterInit

maximize(parmset=regparms+msparms,$

start=%(p=%MSLogisticP(theta),pstar=%MSInit()),$

pmethod=simplex,piters=5,method=bfgs,iters=300) logl gstart gend

The parameter vectors theta, and the betas and sigsqv within the REGPARMS are all set up by @MSRegression. theta is a matrix of logistic indices for the transition probabilities. betas is a VECT[VECT] of the regime-specific regression coefficients: betas(1) is the VECTOR of coefficients in regime 1 and betas(2) is the VECTOR in regime 2. Each will have five elements (for the CONSTANT and four lagged values of LOGGDP). sigsqv is the VECTOR of regime-specific variances, with sigsqv(1) being for regime 1 and sigsqv(2) for regime 2.

The biggest problem with estimating a MS regression is coming up with guess values for the parameters that aim the model in the direction you want. There are two principal problems:

1.There may not really be two regimes, or if there are multiple regimes they may not be well-described by the Markov switching process.

2.There may be multiple modes for the likelihood, some of which may not be describing what you hope the model to identify.

In the second example above (taken from Fruehwirth-Schnatter's textbook), the hope is that the model will identify "expansion" and "recession" regimes. However, with six parameters changing between regimes (CONSTANT, four lag coefficients and the variances), it's possible that maximum likelihood estimates will produce high variance-low variance (or high persistence-low persistence) rather than high growth-low growth. The more "moving parts" that you have, the more difficult it will be to get well-behaved estimates.

This sets the guess values for the parameters by "guessing" that the entries where GGROWTH is small will be in regime 1 and those where it's relatively large will be in regime 2. Note that this goes before the optimization code.

gset MSRegime = %if(ggrowth<.5,1,2)

@MSRegInitial(guessregimes=MSRegime) gstart gend

compute gstart=%regstart(),gend=%regend()

compute theta=||1.0,-2.0||

After the estimation of the model (above), this computes the one-step standardized residuals and does diagnostics

@MSRegStdResiduals ustd gstart gend

@regcorrs(number=5,qstat) ustd