Page 1 of 1
DLM and VARs with missing data
Posted: Tue Jul 11, 2017 4:40 pm
by tclark
I am working in a Bayesian VAR setting with some latent states and missing data. So I mean to use DLM and the Durbin-Koopman simulation smoother to fill in (sample in MCMC) the missing data. I am working with an N x 1 data vector y(t). The missing data are such that, at some time periods, I have some elements of y(t) when others are missing for period t. Consistent with the way I have seen others describe dealing with missing data in VAR settings, I have set up the state space form so that the C matrix is an identity matrix for periods with no missing data and has the relevant 1's (just for the missing variables, not the whole diagonal) zeroed out for the missing observations, and SV=0 for all time periods. With DLM, when I set up the VAR in state space form in such a way that it should deal with the missing data, and just use type=smooth, the smoothed states I get back seem correct, in that the states are data for the observations available and filled in values for the missing observations. With I use type=csimulate to randomize the missing values as I need to in the MCMC context, in any period t in which some data are missing, the state vector doesn't have any of the actual data values; everything has been overwritten, rather than just the truly missing observations. I must be doing something wrong. Any suggestions would be a great help.
Re: DLM and VARs with missing data
Posted: Tue Jul 11, 2017 5:46 pm
by TomDoan
You should be able to just let RATS handle the NA's. This NA's out some random locations and the simulation procedure gives the observations back that aren't NA. RATS automatically adjusts the measurement equations to exclude the information provided by a missing value in the Y (or C).
Code: Select all
dec series x y
frml xfrml x = .7*x{1}+.3*y{1}
frml yfrml y = -.2*x{1}+.9*y{1}
group(cv=||1.0|0.5,1.0||) sim xfrml>>x yfrml>>y
set x 1 400 = %ran(1.0)
set y 1 400 = %ran(1.0)
simulate(model=sim,from=2,to=400)
*
set yf 1 400 = %if(%ranflip(.05),%na,y)
set xf 1 400 = %if(%ranflip(.10),%na,x)
*
dlm(y=||xf,yf||,a=||.7,.3|-.2,.9||,c=%identity(2),f=%identity(2),$
sw=||1.0|0.5,1.0||,sv=%zeros(2,2),type=csimulate) / xstates
set xtilde = xstates(t)(1)
set ytilde = xstates(t)(2)
print / xtilde xf ytilde yf
Re: DLM and VARs with missing data
Posted: Tue Jul 11, 2017 5:54 pm
by tclark
Well, that was a lot easier coding-wise than I was thinking and trying -- the simplest was about the only think I hadn't tried. Thanks very much for the help and very quick reply.
Re: DLM and VARs with missing data
Posted: Tue Jul 11, 2017 9:38 pm
by tclark
Tom --
Sorry, but after checking it a second time, I now realize that using type=csimulate doesn't seem to be producing random draws of the missing data. It is producing the same result as type=smooth with each use. Is there any easy solution to get the draw? It occurred to me this evening that, a couple of years ago, in a different problem in a related context, you suggested the use of multiple instances of DLM:
https://estima.com/forum/viewtopic.php? ... sing#p9215
Thanks again for your help.
Todd
Re: DLM and VARs with missing data
Posted: Tue Jul 11, 2017 10:16 pm
by TomDoan
Do you have an SW option? The code I give above gives different answers with each simulation.
Re: DLM and VARs with missing data
Posted: Wed Jul 12, 2017 5:18 am
by tclark
Thanks, yes, I do have the SW option. I have attached an run of the example you provided, in which running DLM twice (once after the other, so the random numbers should be different) gives the same results for the states with missing, and then running DLM with type=smooth also gives the same states.
Re: DLM and VARs with missing data
Posted: Fri Jul 14, 2017 1:40 pm
by TomDoan
OK. That was a bit of a puzzler. The conditional simulation is done as described in Durbin and Koopman: smooth the original data, do an unconditional simulation, smooth the simulated data, and add the difference between the last two to the original smoothed data. The problem is that the unconditional simulation patches over the missing values (which is what you would typically do in an unconditional simulation), but then with a full set of data, the smoothed data equals the simulated data and it adds zeros to the smoothed original data. So the unconditional simulation needs to respect the missing value placements if it's part of a conditional simulation.