Examples / EMEXAMPLE.RPF |
EMEXAMPLE.RPF demonstrates the EM optimization algorithm, applied to a "mixture" model.
The data are the returns on the US/Japan exchange rate. The example model is a two-regime mixture model with mean zero Normals with different variances in the two regimes. The simple mixture model here is unrealistic as it treats the observations as independent—MSVARIANCES.RPF does a similar model (with direct maximum likelihood rather than EM) with a Markov Switching model governing the regime. However, we are using this to demonstrate how EM works.
The program does the optimization three ways:
1.Direct maximization
2."Classical" EM, looping over the E and M steps.
3.MAXIMIZE with the ESTEP option to do handle the E step at each iteration.
There are three parameters to be estimated: the two variances and the probability of being in regime 1. We will aim to make regime 1 the low variance regime, though the labeling is arbitrary.
Estimate will run from period 2 (we lose one to the calculation of returns from raw exchange rates) until the end of the data set.
set xjpn = 100.0*log(usxjpn/usxjpn{1})
compute rstart=2,rend=%allocend()
We get guess values for the two regimes from the sample variance. We want these to be spread out enough that it is unlikely for the regimes to switch "labels" during the estimation process.
stats xjpn
compute lowv=.5*%variance,highv=2.0*%variance
This does the direct maximization. We save the CPU time so we can compare the different algorithms at the end. Note that because the probability weights are applied to the likelihoods (not log likelihoods) of the two regimes, this does exp(%logdensity(....)) inside the parentheses to get the likelihood, then takes the log outside to produce the overall log likelihood.
nonlin p sigsq1 sigsq2
frml logl = log(p*exp(%logdensity(sigsq1,xjpn))+$
(1-p)*exp(%logdensity(sigsq2,xjpn)))
*
compute t1=%cputime()
compute p=.5
compute sigsq1=lowv,sigsq2=highv
maximize logl rstart rend
This does the classical EM algorithm. The latent data is the regime \(x_t\) for each data point. The E step gives us the probability-weighted log likelihood using the previous estimates for the three parameters. Maximizing that to get new parameters gives us a probability-weighted estimate for the variances, and an average probability across observations to estimate \(p\). One drawback is that, if you want to iterate this to convergence, you have to do your own convergence checks, done here using the %TESTDIFF function applied to OLDPARMS (which has the previous values) and NEWPARMS (the new ones).
compute t2=%cputime()
compute p=.5
compute sigsq1=lowv,sigsq2=highv
compute oldparms=||p,sigsq1,sigsq2||
do iters=1,500
set pstar = f1=exp(%logdensity(sigsq1,xjpn)),$
f2=exp(%logdensity(sigsq2,xjpn)),$
p*f1/(p*f1+(1-p)*f2)
sstats(mean) rstart rend pstar>>p pstar*xjpn^2>>sumsq1 $
(1-pstar)*xjpn^2>>sumsq2
compute sigsq1=sumsq1/p,sigsq2=sumsq2/(1-p)
compute newparms=||p,sigsq1,sigsq2||
if %testdiff(newparms,oldparms)<1.e-6
break
compute oldparms=newparms
end
?"Estimates from classical EM" p sigsq1 sigsq2
?"Iterations" iters
Finally, this uses MAXIMIZE with the ESTEP option. ESTEP is implemented as a FUNCTION which is called at the start of each iteration to (in this case) produce the PSTAR series which has the expected log likelihood for each data point given the previous values for the parameters.
function EStep
set pstar = f1=exp(%logdensity(sigsq1,xjpn)),$
f2=exp(%logdensity(sigsq2,xjpn)),$
p*f1/(p*f1+(1-p)*f2)
end EStep
*
compute t3=%cputime()
frml emlogl = pstar*%logdensity(sigsq1,xjpn)+$
(1-pstar)*%logdensity(sigsq2,xjpn)+$
pstar*log(p)+(1-pstar)*log(1-p)
*
compute p=.5
compute sigsq1=lowv,sigsq2=highv
nonlin p sigsq1 sigsq2
maximize(estep=estep(),method=bhhh,cvcrit=1.e-6,iters=500) emlogl rstart rend
Full Program
open data g10xrate.xls
data(format=xls,org=columns) / usxjpn usxfra usxsui
*
* This estimates a "mixture" model, with a zero mean and two branches
* for the variance.
*
set xjpn = 100.0*log(usxjpn/usxjpn{1})
compute rstart=2,rend=%allocend()
*
* Get guess values based on the sample variance.
*
stats xjpn
compute lowv=.5*%variance,highv=2.0*%variance
******************************************************************
*
* Direct maximization
*
nonlin p sigsq1 sigsq2
frml logl = log(p*exp(%logdensity(sigsq1,xjpn))+$
(1-p)*exp(%logdensity(sigsq2,xjpn)))
*
compute t1=%cputime()
compute p=.5
compute sigsq1=lowv,sigsq2=highv
maximize logl rstart rend
******************************************************************
*
* Classic "EM" calculation
*
compute t2=%cputime()
compute p=.5
compute sigsq1=lowv,sigsq2=highv
compute oldparms=||p,sigsq1,sigsq2||
do iters=1,500
set pstar = f1=exp(%logdensity(sigsq1,xjpn)),$
f2=exp(%logdensity(sigsq2,xjpn)),$
p*f1/(p*f1+(1-p)*f2)
sstats(mean) rstart rend pstar>>p pstar*xjpn^2>>sumsq1 $
(1-pstar)*xjpn^2>>sumsq2
compute sigsq1=sumsq1/p,sigsq2=sumsq2/(1-p)
compute newparms=||p,sigsq1,sigsq2||
if %testdiff(newparms,oldparms)<1.e-6
break
compute oldparms=newparms
end
?"Estimates from classical EM" p sigsq1 sigsq2
?"Iterations" iters
******************************************************************
*
* EM using the ESTEP option
*
function EStep
set pstar = f1=exp(%logdensity(sigsq1,xjpn)),$
f2=exp(%logdensity(sigsq2,xjpn)),$
p*f1/(p*f1+(1-p)*f2)
end EStep
*
compute t3=%cputime()
frml emlogl = pstar*%logdensity(sigsq1,xjpn)+$
(1-pstar)*%logdensity(sigsq2,xjpn)+$
pstar*log(p)+(1-pstar)*log(1-p)
*
compute p=.5
compute sigsq1=lowv,sigsq2=highv
nonlin p sigsq1 sigsq2
maximize(estep=estep(),method=bhhh,cvcrit=1.e-6,iters=500) emlogl rstart rend
*
compute t4=%cputime()
*
?"Timing"
?"ML" @15 t2-t1
?"Classical EM" @15 t3-t2
?"Max(ESTEP)" @15 t4-t3
Output
This does maximum likelihood, plus EM two ways. All three methods come up with the same results, but maximum likelihood, which is easiest to set up, is also the fastest to execute for a model this simple. (Note that the EM "log likelihood" isn't the same as the unconditional log likelihood).
Statistics on Series XJPN
Observations 6236
Sample Mean 0.014351 Variance 0.398893
Standard Error 0.631580 SE of Sample Mean 0.007998
t-Statistic (Mean=0) 1.794334 Signif Level (Mean=0) 0.072808
Skewness 0.784016 Signif Level (Sk=0) 0.000000
Kurtosis (excess) 13.265486 Signif Level (Ku=0) 0.000000
Jarque-Bera 46362.538224 Signif Level (JB=0) 0.000000
MAXIMIZE - Estimation by BFGS
Convergence in 15 Iterations. Final criterion was 0.0000004 <= 0.0000100
Usable Observations 6236
Function Value -5316.1941
Variable Coeff Std Error T-Stat Signif
************************************************************************************
1. P 0.6435210010 0.0252344976 25.50164 0.00000000
2. SIGSQ1 0.0990984332 0.0075979537 13.04278 0.00000000
3. SIGSQ2 0.9404851064 0.0517621213 18.16937 0.00000000
Estimates from classical EM 0.64351 0.09909 0.94046
Iterations 217
MAXIMIZE - Estimation by BHHH
Convergence in 169 Iterations. Final criterion was 0.0000009 <= 0.0000010
Usable Observations 6236
Function Value -8203.9312
Variable Coeff Std Error T-Stat Signif
************************************************************************************
1. P 0.6435127796 0.0135557635 47.47153 0.00000000
2. SIGSQ1 0.0990960843 0.0040509789 24.46226 0.00000000
3. SIGSQ2 0.9404679957 0.0162516182 57.86919 0.00000000
Timing
ML 0
Classical EM 1
Max(ESTEP) 3
Copyright © 2025 Thomas A. Doan