Examples / GIBBS.RPF |
GIBBS.RPF demonstrates Gibbs sampling for a multiple regression with an independent prior. It is adapted from Koop(2003), 4.2.7 from pp 73-77. It demonstrates use of several functions for doing draws for linear regressions (particularly %RANMVPOSTCMOM), and the @MCMCPOSTPROC procedure for post-processing of MCMC draws.
The model analyzed is a linear regression, estimating a hedonic regression for housing prices. The explanatory variables are lot size, number of bedrooms, number of bathrooms and a measure of storage. The prior used is fairly loose. It’s usually easier to think about priors based upon a mean and standard deviation, so the standard deviations need to be converted into prior precisions by squaring and then inverting that matrix. The prior here has a diagonal covariance which is fairly typical—we’re likely to have an idea about how large each coefficient might be, but very little about the specifics regarding correlations among them.
This is the regression of interest:
linreg price
# constant lot_siz bed bath stor
This is the prior for the variance. Setting this will generally need at least some familiarity with the scale of the data, though with over 500 observations, a prior with 5 degrees of freedom will be dominated by the data.
compute s2prior = 5000.0^2
compute nuprior = 5.0
This is the mean and variance for the prior on the regression coefficients themselves. The standard errors are what are actually specified, which gets squared to produce the variance, then inverted to give the precision.
compute [vector] bprior =||0.0,10.0,5000.0,10000.0,10000.0||
compute [symm] vprior = $
%diag(||10000.0^2,5.0^2,2500.0^2,5000.0^2,5000.0^2||)
compute [symm] hprior = inv(vprior)
Compute the cross product matrix from the regression. This will include the dependent variable as the final row. The cross product matrix has all the sufficient statistics for the likelihood (except the number of observations, which is %NOBS).
cmom(lastreg)
The two obvious places to start the Gibbs sampler are the OLS estimates and the prior. We’ll use OLS.
compute bdraw = %beta
compute s2draw = %seesq
compute nburn = 1000
compute ndraws = 10000
This sets up and initializes the two locations where we are storing information generated by the draws. BGIBBS saves the coefficient vectors, while HGIBBS saves the draws for the precision (reciprocal of the variance).
dec series[vect] bgibbs
dec series hgibbs
gset bgibbs 1 ndraws = %zeros(%nreg,1)
set hgibbs 1 ndraws = 0.0
This has a relatively simple draw loop. We first draw the residual precision given the beta's, then the beta's given the new draw from the residual precision. When we are past the burn-in, we save the BDRAW VECTOR's and HDRAW values for the value of DRAW.
do draw=-nburn,ndraws
compute rssplus = nuprior*s2prior + %rsscmom(%cmom,bdraw)
compute hdraw = %ranchisqr(nuprior+%nobs)/rssplus
compute bdraw = %ranmvpostcmom(%cmom,hdraw,hprior,bprior)
if draw<=0
next
compute bgibbs(draw) = bdraw
compute hgibbs(draw) = hdraw
end do draw
Now we do the post-processing, computing the means, standard errors, numerical standard errors (estimate of the precision of the estimate of the mean) and CD measures:
@mcmcpostproc(ndraws=ndraws,mean=bmean,$
stderrs=bstderrs,cd=bcd,nse=bnse) bgibbs
This sets up a REPORT with the output from @MCMCPOSTPROC. @EQNREGLABELS(0)(I) is the regressor label for coefficient I in the last estimated regression (equation 0). The report is formatted to use 3 digits right on the mean, standard error and numerical standard error (all of which depend upon the scales of the coefficients) and 2 digits right on the CD measure (which doesn't; it has an asymptotic standard Normal distribution).
report(action=define)
report(atrow=1,atcol=1,align=center) "Variable" "Coeff" $
"Std Error" "NSE" "CD"
do i=1,%nreg
report(row=new,atcol=1) %eqnreglabels(0)(i) bmean(i) $
bstderrs(i) bnse(i) bcd(i)
end do i
report(action=format,atcol=2,tocol=4,picture="*.###")
report(action=format,atcol=5,picture="*.##")
report(action=show)
Full Program
open data hprice.prn
data(format=prn,org=columns) 1 546 price lot_siz bed bath stor drive $
recroom bment gas aircond gar desireloc
*
* This is the equation we're analyzing
*
linreg price
# constant lot_siz bed bath stor
*
* Prior for variance.
*
compute s2prior=5000.0^2
compute nuprior=5.0
*
* Prior mean and variance. Convert the variance to precision by
* inverting.
*
compute [vector] bprior=||0.0,10.0,5000.0,10000.0,10000.0||
compute [symm] vprior=$
%diag(||10000.0^2,5.0^2,2500.0^2,5000.0^2,5000.0^2||)
compute [symm] hprior =inv(vprior)
*
* Compute the cross product matrix from the regression. This will
* include the dependent variable as the final row. The cross product
* matrix has all the sufficient statistics for the likelihood (except
* the number of observations, which is %nobs).
*
cmom(lastreg)
*
* The two obvious places to start the Gibbs sampler are the OLS
* estimates and the prior. We'll use OLS.
*
compute bdraw =%beta
compute s2draw=%seesq
*
* We'll use the following for all MCMC estimators. This ends up taking
* nburn+1 burn-ins.
*
compute nburn =1000
compute ndraws=10000
*
dec series[vect] bgibbs
dec series hgibbs
gset bgibbs 1 ndraws = %zeros(%nreg,1)
set hgibbs 1 ndraws = 0.0
*
do draw=-nburn,ndraws
*
* Draw residual precision conditional on previous beta
*
compute rssplus=nuprior*s2prior+%rsscmom(%cmom,bdraw)
compute hdraw =%ranchisqr(nuprior+%nobs)/rssplus
*
* Draw betas given hdraw
*
compute bdraw =%ranmvpostcmom(%cmom,hdraw,hprior,bprior)
if draw<=0
next
*
* Do the bookkeeping here.
*
compute bgibbs(draw)=bdraw
compute hgibbs(draw)=hdraw
end do draw
*
@mcmcpostproc(ndraws=ndraws,mean=bmean,$
stderrs=bstderrs,cd=bcd,nse=bnse) bgibbs
report(action=define)
report(atrow=1,atcol=1,align=center) "Variable" "Coeff" $
"Std Error" "NSE" "CD"
do i=1,%nreg
report(row=new,atcol=1) %eqnreglabels(0)(i) bmean(i) $
bstderrs(i) bnse(i) bcd(i)
end do i
report(action=format,atcol=2,tocol=4,picture="*.###")
report(action=format,atcol=5,picture="*.##")
report(action=show)
Output
Linear Regression - Estimation by Least Squares
Dependent Variable PRICE
Usable Observations 546
Degrees of Freedom 541
Centered R^2 0.5355473
R-Bar^2 0.5321133
Uncentered R^2 0.9382388
Mean of Dependent Variable 68121.597070
Std Error of Dependent Variable 26702.670926
Standard Error of Estimate 18265.227116
Sum of Squared Residuals 1.80488e+011
Regression F(4,541) 155.9529
Significance Level of F 0.0000000
Log Likelihood -6129.9928
Durbin-Watson Statistic 1.4829
Variable Coeff Std Error T-Stat Signif
************************************************************************************
1. Constant -4009.54998 3603.10903 -1.11280 0.26628732
2. LOT_SIZ 5.42917 0.36925 14.70325 0.00000000
3. BED 2824.61379 1214.80763 2.32515 0.02043276
4. BATH 17105.17446 1734.43413 9.86211 0.00000000
5. STOR 7634.89700 1007.97449 7.57449 0.00000000
The output in the table depends upon random numbers and so will be similar, but not exactly the same when it is re-run.
Variable Coeff Std Error NSE CD
Constant -4075.370 3236.783 34.341 -1.16
LOT_SIZ 5.449 0.362 0.004 1.52
BED 3215.961 1058.250 10.613 0.86
BATH 16096.096 1612.908 15.978 -0.06
STOR 7703.893 971.488 9.882 -1.31
Copyright © 2025 Thomas A. Doan