GIBBS.RPF

GIBBS.RPF demonstrates Gibbs sampling for a multiple regression with an independent prior. It is adapted from Koop(2003), 4.2.7 from pp 73-77. It demonstrates use of several functions for doing draws for linear regressions (particularly %RANMVPOSTCMOM), and the @MCMCPOSTPROC procedure for post-processing of MCMC draws.

The model analyzed is a linear regression, estimating a hedonic regression for housing prices. The explanatory variables are lot size, number of bedrooms, number of bathrooms and a measure of storage. The prior used is fairly loose. It’s usually easier to think about priors based upon a mean and standard deviation, so the standard deviations need to be converted into prior precisions by squaring and then inverting that matrix. The prior here has a diagonal covariance which is fairly typical—we’re likely to have an idea about how large each coefficient might be, but very little about the specifics regarding correlations among them.

This is the regression of interest:

linreg price

# constant lot_siz bed bath stor

This is the prior for the variance. Setting this will generally need at least some familiarity with the scale of the data, though with over 500 observations, a prior with 5 degrees of freedom will be dominated by the data.

compute s2prior = 5000.0^2

compute nuprior = 5.0

This is the mean and variance for the prior on the regression coefficients themselves. The standard errors are what are actually specified, which gets squared to produce the variance, then inverted to give the precision.

compute [vector] bprior =||0.0,10.0,5000.0,10000.0,10000.0||

compute [symm] vprior = $

%diag(||10000.0^2,5.0^2,2500.0^2,5000.0^2,5000.0^2||)

compute [symm] hprior = inv(vprior)

Compute the cross product matrix from the regression. This will include the dependent variable as the final row. The cross product matrix has all the sufficient statistics for the likelihood (except the number of observations, which is %NOBS).

cmom(lastreg)

The two obvious places to start the Gibbs sampler are the OLS estimates and the prior. We’ll use OLS.

compute bdraw = %beta

compute s2draw = %seesq

compute nburn = 1000

compute ndraws = 10000

This sets up and initializes the two locations where we are storing information generated by the draws. BGIBBS saves the coefficient vectors, while HGIBBS saves the draws for the precision (reciprocal of the variance).

dec series[vect] bgibbs

dec series hgibbs

gset bgibbs 1 ndraws = %zeros(%nreg,1)

set hgibbs 1 ndraws = 0.0

This has a relatively simple draw loop. We first draw the residual precision given the beta's, then the beta's given the new draw from the residual precision. When we are past the burn-in, we save the BDRAW VECTOR's and HDRAW values for the value of DRAW.

do draw=-nburn,ndraws

compute rssplus = nuprior*s2prior + %rsscmom(%cmom,bdraw)

compute hdraw = %ranchisqr(nuprior+%nobs)/rssplus

compute bdraw = %ranmvpostcmom(%cmom,hdraw,hprior,bprior)

if draw<=0

compute bgibbs(draw) = bdraw

compute hgibbs(draw) = hdraw

end do draw

Now we do the post-processing, computing the means, standard errors, numerical standard errors (estimate of the precision of the estimate of the mean) and CD measures:

@mcmcpostproc(ndraws=ndraws,mean=bmean,$

stderrs=bstderrs,cd=bcd,nse=bnse) bgibbs

This sets up a REPORT with the output from @MCMCPOSTPROC. @EQNREGLABELS(0)(I) is the regressor label for coefficient I in the last estimated regression (equation 0). The report is formatted to use 3 digits right on the mean, standard error and numerical standard error (all of which depend upon the scales of the coefficients) and 2 digits right on the CD measure (which doesn't; it has an asymptotic standard Normal distribution).

report(action=define)

report(atrow=1,atcol=1,align=center) "Variable" "Coeff" $

"Std Error" "NSE" "CD"

do i=1,%nreg

report(row=new,atcol=1) %eqnreglabels(0)(i) bmean(i) $

bstderrs(i) bnse(i) bcd(i)

end do i

report(action=format,atcol=2,tocol=4,picture="*.###")

report(action=format,atcol=5,picture="*.##")

report(action=show)

Full Program

open data hprice.prn
data(format=prn,org=columns) 1 546 price lot_siz bed bath stor drive $
recroom bment gas aircond gar desireloc
*
* This is the equation we're analyzing
*
linreg price
# constant lot_siz bed bath stor
*
* Prior for variance.
*
compute s2prior=5000.0^2
compute nuprior=5.0
*
* Prior mean and variance. Convert the variance to precision by
* inverting.
*
compute [vector] bprior=||0.0,10.0,5000.0,10000.0,10000.0||
compute [symm]   vprior=$
   %diag(||10000.0^2,5.0^2,2500.0^2,5000.0^2,5000.0^2||)
compute [symm]   hprior =inv(vprior)
*
* Compute the cross product matrix from the regression. This will
* include the dependent variable as the final row. The cross product
* matrix has all the sufficient statistics for the likelihood (except
* the number of observations, which is %nobs).
*
cmom(lastreg)
*
* The two obvious places to start the Gibbs sampler are the OLS
* estimates and the prior. We'll use OLS.
*
compute bdraw =%beta
compute s2draw=%seesq
*
* We'll use the following for all MCMC estimators. This ends up taking
* nburn+1 burn-ins.
*
compute nburn =1000
compute ndraws=10000
*
dec series[vect] bgibbs
dec series       hgibbs
gset bgibbs 1 ndraws = %zeros(%nreg,1)
set hgibbs 1 ndraws = 0.0
*
do draw=-nburn,ndraws
   *
   * Draw residual precision conditional on previous beta
   *
   compute rssplus=nuprior*s2prior+%rsscmom(%cmom,bdraw)
   compute hdraw =%ranchisqr(nuprior+%nobs)/rssplus
   *
   * Draw betas given hdraw
   *
   compute bdraw =%ranmvpostcmom(%cmom,hdraw,hprior,bprior)
   if draw<=0
      next
   *
   * Do the bookkeeping here.
   *
   compute bgibbs(draw)=bdraw
   compute hgibbs(draw)=hdraw
end do draw
*
@mcmcpostproc(ndraws=ndraws,mean=bmean,$
    stderrs=bstderrs,cd=bcd,nse=bnse) bgibbs
report(action=define)
report(atrow=1,atcol=1,align=center) "Variable" "Coeff" $
    "Std Error" "NSE" "CD"
do i=1,%nreg
   report(row=new,atcol=1) %eqnreglabels(0)(i) bmean(i) $
     bstderrs(i) bnse(i) bcd(i)
end do i

report(action=format,atcol=2,tocol=4,picture="*.###")

report(action=format,atcol=5,picture="*.##")

report(action=show)

Output

Linear Regression - Estimation by Least Squares

Dependent Variable PRICE

Usable Observations 546

Degrees of Freedom 541

Centered R^2 0.5355473

R-Bar^2 0.5321133

Uncentered R^2 0.9382388

Mean of Dependent Variable 68121.597070

Std Error of Dependent Variable 26702.670926

Standard Error of Estimate 18265.227116

Sum of Squared Residuals 1.80488e+011

Regression F(4,541) 155.9529

Significance Level of F 0.0000000

Log Likelihood -6129.9928

Durbin-Watson Statistic 1.4829

Variable Coeff Std Error T-Stat Signif

************************************************************************************

1. Constant -4009.54998 3603.10903 -1.11280 0.26628732

2. LOT_SIZ 5.42917 0.36925 14.70325 0.00000000

3. BED 2824.61379 1214.80763 2.32515 0.02043276

4. BATH 17105.17446 1734.43413 9.86211 0.00000000

5. STOR 7634.89700 1007.97449 7.57449 0.00000000

The output in the table depends upon random numbers and so will be similar, but not exactly the same when it is re-run.

Variable Coeff Std Error NSE CD

Constant -4075.370 3236.783 34.341 -1.16

LOT_SIZ 5.449 0.362 0.004 1.52

BED 3215.961 1058.250 10.613 0.86

BATH 16096.096 1612.908 15.978 -0.06

STOR 7703.893 971.488 9.882 -1.31