Examples / SV.RPF |
SV.RPF estiimates a stochastic volatility model by (approximate) state-space methods. This is covered in greater detail in the ARCH/GARCH and Volatility Models e-course, which also covers the more technically demanding (but more exact) Bayesian methods for estimating these models.
The unobservable log variance is assumed to evolve according to:
\begin{equation} \log {h_t} = \gamma + \varphi \log {h_{t - 1}} + {w_t} \label{eq:sv_stateeq} \end{equation}
where the observable is assumed to obey
\begin{equation} {y_t} = \sqrt {{h_t}} {v_t} , {v_t} \sim N(0,1) \label{eq:sv_observable} \end{equation}
This assumes that the mean of \(y_t\) is zero—in practice, it will usually be a raw returns series with means removed, or it could be the residuals from a preliminary regression. At any rate, the series that you actually put through the model should be at least theoretically mean zero with no serial correlation.
If you square both sides of \eqref{eq:sv_observable} and take logs, you get:
\begin{equation} \log \,y_t^2 = \log \,h_t + \log \,v _t^2 \label{eq:sv_squaredobs} \end{equation}
where \(\log \,y_t^2\) is observable (since \(y\) is). \(\log \,v _t^2\) is the log of a squared Normal, which is a known distribution with a known mean and known variance. It isn't Normal, but \eqref{eq:sv_squaredobs} can be adjusted to at least have a mean zero residual, which will allow it to function as the measurement equation in an approximate state-space model (with \(\log \,h_t\) as the only state variable). State-space methods can then be used to estimate the free parameters (\(\gamma\), \(\varphi\) and the variance of \(w\)) as a Quasi-Maximum Likelihood Estimator by treating the two error processes (\(w\) and \(v\)) as Gaussian.
In practice, \(\varphi\) is often very near one, and a special case has \(\varphi = 1\) and \(\gamma = 0\). SV.RPF estimates the model both ways—freely estimated, \(\varphi\) is, in fact, quite close to 1. The unrestricted model actually estimates \({\gamma ^*} = \gamma /(1 - \varphi )\) rather than \(\gamma\) directly as \(\gamma\) is highly correlated with \(\varphi\) when \(\varphi\) is near one. \({\gamma ^*}\) is the theoretical mean of the \(\log \,h_t\) which can be at least roughly estimated directly from the data.
Note that the Normal approximation to \eqref{eq:sv_squaredobs} is most inaccurate when \(y_t\) is near zero, which makes \(\log \,y_t^2\) a very large negative number. (These very small residuals have been dubbed "in-liers", as it's a small residual rather than a large one ("outlier") that creates problems for the model). As a result, some authors applying this will truncate the value of \(y_t^2\) to prevent it from getting too small. (We don't do that here). Note that the appropriate truncation value will depend upon the natural scale of \(y\) so a truncation point with one data set may not be a good choice for another. The (much) more complicated Bayesian methods are aimed at modeling the exact distribution of the error term in \eqref{eq:sv_squaredobs}.
The example uses the log rate of change in the US/UK exchange rate. The data set is from the multivariate extension of this analysis done in Harvey, Ruiz and Shepherd(1994).
open data xrates.xls
data(format=xls,org=columns) 1 946 usxuk usxger usxjpn usxsui
set dlogp = 100.0*log(usxuk{0}/usxuk{1})
MEANX2 is the (exact) mean of log chi-square, VARX2 is the variance. (Some papers used rounded values).
compute meanx2 = %digamma(0.5) - log(0.5)
compute varx2 = %trigamma(0.5)
We've written this to remove the mean from the raw series by regressing on a constant (only), to make it simpler to modify for a different mean model.
linreg dlogp
# constant
set demean = %resids
As mentioned above, this uses what we call GAMMAX as \(\gamma /(1 - \varphi )\) to avoid numerical problems with the correlation between \(\gamma\) and \(\varphi\) when \(\varphi\) is close to 1.
nonlin phi sw gammax
This is the dependent variable for the state-space model, removing the theoretical mean from the log squared data:
set ysq = log(demean^2)-meanx2
This gets initial guess values from ARMA(1,1) model. PHI is taken directly as the AR(1) coefficient. The variance SW is backed out by matching first order autocorrelations in the MA term. That can come in negative—if it does, it is replaced by a small positive number. GAMMAX is set to the mean of the YSQ series.
compute phi=%beta(2),sw=-phi*varx2*(1+%beta(3)^2)/%beta(3)-(1+phi^2)*varx2
compute sw=%if(sw<0,.1,sw)
compute gammax=%mean
This estimates the unconstrained model:
dlm(method=bfgs,sw=sw,sv=varx2,y=ysq,type=filter,c=1.0, $
sx0=sw/(1-phi^2),x0=gammax,a=phi,z=gammax*(1-phi)) 2 * states
The initial values are the ergodic values from \eqref{eq:sv_stateeq} which we compute directly, though we could also have used PRESAMPLE=ERGODIC in place of the SX0 and X0 options. SV is the theoretical value for the variance in \eqref{eq:sv_squaredobs}. Some papers choose to let that last variance be freely estimated instead.
This estimates the Random Walk model. The only free parameter now is SW, since \(\varphi\) is 1 and \(\gamma\) is 0. PRESAMPLE=DIFFUSE is the correct choice here since the only state is a random walk.
nonlin sw
dlm(method=bfgs,sw=sw,sv=varx2,y=ysq,type=filter,c=1.0, $
presample=diffuse,a=1.0) rstart * states
Full Program
open data xrates.xls
data(format=xls,org=columns) 1 946 usxuk usxger usxjpn usxsui
*
set dlogp = 100.0*log(usxuk{0}/usxuk{1})
*
* meanx2=mean of log chi-square,varx2=variance of log chi-square
*
compute meanx2=%digamma(0.5)-log(0.5)
compute varx2 =%trigamma(0.5)
*
* The "gamma" and "phi" variables are very highly correlated when
* phi is near 1, which makes estimation by general hill-climbing
* procedures somewhat tricky. Instead, we reparameterize this to
* use gammax=gamma/(1-phi) in place of gamma.
*
* Remove any deterministic components (in this case, just the CONSTANT).
*
linreg dlogp
# constant
set demean = %resids
*
* Save the start of the regression range
*
compute rstart=%regstart()
*
nonlin phi sw gammax
set ysq = log(demean^2)-meanx2
*
* Get initial guess values from ARMA(1,1) model
*
boxjenk(ar=1,ma=1,constant,noprint) ysq
*
* PHI is taken directly as the AR(1) coefficient The variance SW is backed out by
* matching first order autocorrelations in the MA term. gammax is chosen as the
* mean of the ysq series. (BOXJENK sets %MEAN to the mean of the dependent
* variable).
*
compute phi=%beta(2),sw=-phi*varx2*(1+%beta(3)^2)/%beta(3)-(1+phi^2)*varx2
compute sw=%if(sw<0,.1,sw)
compute gammax=%mean
*
* Estimate the unconstrained model
*
dlm(method=bfgs,sw=sw,sv=varx2,y=ysq,type=filter,c=1.0, $
sx0=sw/(1-phi^2),x0=gammax,a=phi,z=gammax*(1-phi)) rstart * states
*
* Estimate the RW model
*
nonlin sw
dlm(method=bfgs,sw=sw,sv=varx2,y=ysq,type=filter,c=1.0, $
presample=diffuse,a=1.0) rstart * states
Output
The results are virtually identical, since the unconstrained model estimates a near-one value for PHI.
DLM - Estimation by BFGS
Convergence in 9 Iterations. Final criterion was 0.0000000 <= 0.0000100
Usable Observations 945
Rank of Observables 945
Log Likelihood -2081.2196
Variable Coeff Std Error T-Stat Signif
************************************************************************************
1. PHI 0.991219058 0.008239376 120.30269 0.00000000
2. SW 0.006964340 0.005321236 1.30878 0.19060803
3. GAMMAX -0.793161066 0.326234693 -2.43126 0.01504645
DLM - Estimation by BFGS
Convergence in 3 Iterations. Final criterion was 0.0000038 <= 0.0000100
Usable Observations 945
Rank of Observables 944
Log Likelihood -2081.4957
Variable Coeff Std Error T-Stat Signif
************************************************************************************
1. SW 0.0042074648 0.0024879614 1.69113 0.09081208
Copyright © 2025 Thomas A. Doan