Statistics and Algorithms / Simulations and Bootstrapping /

Gibbs Sampling

The Gibbs sampler (Gelfand and Smith, 1990) is one of several techniques developed recently to deal with posterior distributions which not long ago were considered to be intractable. Monte Carlo integration for impulse responses is able to work well despite the large number of parameters in the underlying vector autoregression because, with the help of a convenient choice of prior, it is fairly easy to make draws from the posterior distribution. Unfortunately, there are very few multivariate distributions for which this is true. Even in a simple linear regression model, all it takes is a slight deviation from “convenience” in the prior to produce a tangled mess in the posterior distribution, making direct draws with the basic toolkit of random Normals and gammas impossible.

The Gibbs sampler can be brought into play when the parameters can be partitioned so that, although an unconditional draw can’t be obtained directly, each partition can be drawn conditional on the parameters outside its partition. The standard result is that if we draw sequentially from the conditional distributions, the resulting draws are, in the limit, from the unconditional distribution. Because this is a limit result, it is common for practitioners to ignore a certain number of early draws (called the “burn-in”) which might not be representative of the unconditional distribution.

If importance sampling is able to work successfully in a given situation, it usually should be chosen over Gibbs sampling, because the draws are independent rather than correlated. However, it often becomes quite hard to find a workable importance function as the dimension of the parameter space gets larger.

A General Framework

The following is a general set-up that we’ve found useful for Gibbs sampling (and which is created using the Help>Pseudo-Code Generator>Monte Carlo/Gibbs Loop wizard)

compute nburn=# of burn-in draws

compute ndraws=# of accepted draws

...Set initial values for parameters.

...Do any calculations which don’t depend upon the draws.

...Initialize bookkeeping information

infobox(action=define,progress,lower=-nburn,upper=ndraws) $

"Gibbs Sampler"

do draw=-nburn,ndraws

infobox(current=draw)

...Do the next draw

if draw<=0

...Update bookkeeping information

end do draw

infobox(action=remove)

Note that, as written, this actually does NBURN+1 burn-in draws, since it discards any with DRAW=-NBURN through DRAW=0.

Mean-Variance Blocking

The most common blocking for the Gibbs sampler is between the regression parameters and the variance or precision. For the standard Normal linear model, we can draw $\beta$ conditional on $h$ and $h$ conditional on $\beta$. For the VAR with a general prior, we can draw $\bf{\beta}$ conditional on $\Sigma$ and $\Sigma$ conditional on $\bf{\beta}$. We’ll show the most efficient way to do Gibbs draws from a linear regression. Outside the loop, do

cmom

# x_variables_list y

linreg(cmom) y

# x_variables_list

compute beta=%beta

to calculate the cross moment matrix and get an initial value for the coefficients. Also, set up the prior mean (BPRIOR) and precision (HPRIOR) for the coefficients and the degrees of freedom (NUPRIOR) and scale factor (S2PRIOR) for the residual precision. Inside the loop, do

compute rssplus = nuprior*s2prior + %rsscmom(%cmom,bdraw)

compute hdraw = %ranchisqr(nuprior+%nobs)/rssplus

compute bdraw = %ranmvpostcmom(%cmom,hdraw,hprior,bprior)

GIBBS.RPF is a basic example of this.

@MCMCPOSTPROC

There is often more work to do after getting the simulations than there is doing them in the first place. We’ve provided a special procedure for doing a standard set of post-processing calculations for Gibbs sampling. That’s @MCMCPostProc. The input to this is a SERIES[VECTOR] which has the results of the kept draws from the sampler. You can use the procedure to calculate the sample means and standard errors for each component of the vector and also to compute some measures of the performance of the sampler. Remember that the draws here aren’t independent, and if there’s too high a correlation, the results are likely to be much less precise than we would like.

@MCMCPostProc(NDRAWS=# of draws,other options) stats

stats is a SERIES[VECTOR]. You want to create one (or more) of these to hold the statistics of interest at each draw that you keep. Inside the draw loop, organize a VECTOR with the statistics that you want, and save a copy of it into entry “draw” of the SERIES[VECTOR].

Note that if the number of draws is large (more than 10000), it can take quite a while to do the diagnostics and the percentiles.

The MEAN and STDERRS are the sample mean and standard errors for each element of the stats across the draws, so if you do MEAN=BMEANS, BMEANS(1) will be the sample mean across draws for the first statistic. The NSE option saves the corresponding estimates of the standard errors of the MEAN component, which are computed using HAC estimators since the Markov Chain estimates in general are at least somewhat autocorrelated. The NSE should decrease to zero as the number of draws increases.

If you want percentiles of the distributions, use the PERCENTILES to make the request and QUANTILES (which is a VECTOR[VECTOR]) for the output. Each outer element of QUANTILES is for a statistic, and the inner elements are the computed percentiles in the same order as the request. So PERCENTILES=||.16,.50,.84||,QUANTILES=Q will mean that Q(1) will be a VECTOR with the .16,.50 and .84 quantiles of the first saved statistic (in that order), similarly Q(2) for the second statistic.

The CD and BW options are diagnostics, computed element by element (so they return VECTORs). CD is a comparison of early draws against late draws and is mainly a measure of how well the chain has been “burned-in”. Asymptotically, it’s N(0,1) component by component with neither sign expected over the other. BW computes the ratio of between to within variation within the chain, using the number of blocks given by the BLOCKS option, that is, if you have 25000 draws and the default BLOCKS=10, it will look at 10 blocks of 2500. If the draws are independent, this will have an F distribution. A large value indicates that the draws tend to be highly correlated for long stretches.

For instance, in GIBBS.RPF, we use the following:

@mcmcpostproc(ndraws=ndraws,mean=bmean,$

stderrs=bstderrs,cd=bcd,nse=bnse) bgibbs

The saved statistics (in this case, the simulated regression coefficients) are in the SERIES[VECTOR] named BGIBBS. We compute the mean (into BMEAN), standard errors (into BSTDERRS), the CD measure (described in Koop(2003)) which is for testing the adequacy of the number of burn-in draws, and the numerical standard errors, which estimate the precision of the sample means.

Multivariate Linear Regressions

A VAR is a special case of a multivariate linear regression. If we use a Jeffreys' prior, we don't need Gibbs sampling as the procedure described in "Monte Carlo Integration (VAR)" can be applied. However, if there is an informative prior on the regression coefficients, the simple method can't be applied (because the residual covariance matrix doesn't have an unconditional distribution). The Gibbs sampling calculations are discussed in Standard Posteriors. This requires inverting the precision matrix for the entire system. This could be extremely time-consuming for a large model—with the time required for inversion going up with the cube of the size, a 1200 coefficient model will take (roughly) 64 times as long as a 300 coefficient model.

See GIBBSVAR.RPF for an example (applied to a VAR of relatively modest size). Note that most of the program is actually spent creating the prior.

A similar, but more complicated analysis, is required for a near-VAR or any other linear system with different explanatory variables among the equations, even with a non-informative prior. (In a technical sense, a near-VAR is a VAR with an infinitely tight prior on the omitted regressors).

Although the procedure for drawing from a (flat prior) near-VAR is equivalent to estimating a SUR model each time you get a new covariance matrix, the SUR instruction, in addition to having to invert the big matrix, also has to compute the cross product matrix of the data. Since the data information is the same from sweep to sweep, we can eliminate that time cost by organizing the data information in advance. That can be done with the @SURGIBBSSETUP procedure group which handles the interactions between the drawing procedure and the data. If you are interested in the technical details, they are covered as part of the Bayesian Econometrics e-course. See MONTESUR.RPF for an example.

Unobservable States (Parameter Augmentation)

Gibbs sampling can be also be used in situations where there are unobservable variables at each time period. For instance, the Markov switching models have the unobservable $S_t$ in additional to the model’s other parameters. These are added to the parameter set in a process known as augmentation. Note that the number of parameters now exceeds the number of data points—this is possible only because of the use of priors.

Getting draws for regression parameters given the regimes is quite simple if the model is otherwise linear. The tricky part is getting the draws for the latent variables. The simplest way to do this is to draw them one at a time; that is, draws are done sequentially (as $t$ goes from 1 to $T$) for

\begin{equation} S_t |S_1 , \ldots ,S_{t - 1} ,S_{t + 1} , \ldots ,S_T ,\theta \end{equation}

where $\theta$ are the other parameters. Note that this can be very time-consuming. In addition to the cost of doing (at least) $T$ function evaluations per Gibbs sweep, because of the high correlation between the $S$ values (in most cases), the chain also has a high degree of correlation from one step to the next, so it requires a long burn-in time and many draws. (As a general rule, Gibbs sampling works best when the blocks of parameters are close to being independent of each other, and worst when correlated parameters are drawn separately). A more efficient (but more complicated) way to do the draws for the states is known as forward-filter–backwards-sampling. This has many similarities to Kalman smoothing as it takes a forward pass through the data to determine the joint distribution of the states, then uses that to take draws starting at the end, working towards the start of the data set.