Statistics and Algorithms / DSGE Models /

DSGE: Estimation

There’s a relatively thin literature on formal estimation of the deep parameters in a DSGE using actual data. Conceptually, estimation shouldn’t be that difficult. Given the model, we have a method which generates either an exact state-space model (for linear models) or an approximate one (for non-linear models). For any collection of parameters, we can generate a history of states, which can be compared with actual data.

Unfortunately, it’s not that easy. The main problem can be seen by looking at example DSGEKPR.RPF. This is a perfectly reasonable-looking model, which generates predictions for output, consumption, labor input, investment. However, it has only one exogenous shock. As a result, the errors in the four predictions have a rank one covariance matrix: once you know one, you should be able to compute the others exactly. In sample, that will never happen. In order to do a formal estimation, we will have to do at least one of the following:

1.Reduce the number of observables to match the number of fundamental shocks.

2.Add measurement errors to increase the rank of the covariance matrix.

3.Add more fundamental shocks.

Reducing the number of observables makes it much more likely that some of the deep parameters really can’t be well estimated. For instance, because the capital stock is effectively unobservable, the depreciation rate \(\delta\) can’t easily be determined from the data—a high value of \(\delta\) with a high level for the capital stock produces almost the same predictions for the observable variables as a low value for both. Adding fundamental shocks requires a more complex model.

For models which admit a complete description of the observables, there are two main estimation techniques: maximum likelihood and Bayesian methods. For this, Bayesian methods are more popular than straight maximum likelihood. In some sense, all estimation exercises with these models are at least partially Bayesian—when parameters such as the discount rate and depreciation rate are pegged at commonly accepted values, that’s Bayesian. Those are parameters for which the data have little information, so we impose a point prior.

The heart of both maximum likelihood and Bayesian methods is the evaluation of the likelihood of the derived state-space model. So what accounts for the popularity of the latter? Variational methods of optimization (like the BFGS algorithm used by DLM) often try to evaluate the likelihood at what are apparently very strange sets of parameters. This is part of the process of a “black box” routine discovering the shape of the likelihood. For simpler functions, there is usually no harm in this: if a test set of parameters requires the log or square root of a negative number, the function returns an NA and the parameter set is rejected. If a root goes explosive, the residuals get really large and the likelihood will be correspondingly small. However, the function evaluation in a DSGE is quite complicated. If the model is non-linear, we must first do the following:

1.solve for a steady-state expansion point

2.solve for the backwards representation

3.solve for the ergodic mean and covariance of the state-space model

before we can Kalman filter through to evaluate the likelihood. On the first few iterations (before much is known about the shape of the function), it’s quite easy for a test set of parameters to go outside the range at which those steps are all well-behaved. By imposing a prior which excludes values we know to be nonsensical, we can avoid those types of problems.

The other main reason is that the likelihood function can be quite flat with respect to parameters besides \(\beta\) and \(\delta\); those are just the two most common parameters that are poorly identified from the data. Including a prior allows for those other parameters to be restricted to a region that makes economic sense.

Both maximum likelihood and (especially) Bayesian estimation are quite complicated. We provide an example (CAGAN.RPF) of maximum likelihood for a small model with two observables. Even this model (which doesn’t have expectational terms, so all DSGE is doing is organizing the state space representation) requires some fussing with guess values.

A couple of things to note: first, the DSGE solution is inside a FUNCTION, here called EvalModel:

function EvalModel

dsge(model=cagan,a=adlm,f=fdlm) x mu a1 a2 eps eta

end EvalModel

Because each function evaluation needs a new set of system matrices, we need to use a START option on DLM to call EvalModel to create them. Note that the function doesn’t actually return anything; instead, it sets the matrices ADLM and FDLM which are used in the DLM instruction.

Second, right after the model is set up comes the following:

compute EvalModel()

compute cdlm=%identity(2)~~%zeros(%rows(adlm)-2,2)

We have two observables, which match up with the first two variables in the model. However, the \(\bf{C}\) matrix needs to have the same number of rows as there are states in the model. Since DSGE is a bit of a black box, it might be hard to tell in advance just how many states there are in the expanded model. The example takes care of that by counting the number of rows in the \(\bf{A}\) matrix and adding a block of zeros at the bottom of \(\bf{C}\) to cover all but the first two.