EMEXAMPLE.RPF

EMEXAMPLE.RPF demonstrates the EM optimization algorithm, applied to a "mixture" model.

The data are the returns on the US/Japan exchange rate. The example model is a two-regime mixture model with mean zero Normals with different variances in the two regimes. The simple mixture model here is unrealistic as it treats the observations as independent—MSVARIANCES.RPF does a similar model (with direct maximum likelihood rather than EM) with a Markov Switching model governing the regime. However, we are using this to demonstrate how EM works.

The program does the optimization three ways:

1.Direct maximization

2."Classical" EM, looping over the E and M steps.

3.MAXIMIZE with the ESTEP option to do handle the E step at each iteration.

There are three parameters to be estimated: the two variances and the probability of being in regime 1. We will aim to make regime 1 the low variance regime, though the labeling is arbitrary.

Estimate will run from period 2 (we lose one to the calculation of returns from raw exchange rates) until the end of the data set.

set xjpn = 100.0*log(usxjpn/usxjpn{1})

compute rstart=2,rend=%allocend()

We get guess values for the two regimes from the sample variance. We want these to be spread out enough that it is unlikely for the regimes to switch "labels" during the estimation process.

stats xjpn
compute lowv=.5*%variance,highv=2.0*%variance

This does the direct maximization. We save the CPU time so we can compare the different algorithms at the end. Note that because the probability weights are applied to the likelihoods (not log likelihoods) of the two regimes, this does exp(%logdensity(....)) inside the parentheses to get the likelihood, then takes the log outside to produce the overall log likelihood.

nonlin p sigsq1 sigsq2

frml logl = log(p*exp(%logdensity(sigsq1,xjpn))+$

(1-p)*exp(%logdensity(sigsq2,xjpn)))

compute t1=%cputime()

compute p=.5

compute sigsq1=lowv,sigsq2=highv

maximize logl rstart rend

This does the classical EM algorithm. The latent data is the regime $x_t$ for each data point. The E step gives us the probability-weighted log likelihood using the previous estimates for the three parameters. Maximizing that to get new parameters gives us a probability-weighted estimate for the variances, and an average probability across observations to estimate $p$. One drawback is that, if you want to iterate this to convergence, you have to do your own convergence checks, done here using the %TESTDIFF function applied to OLDPARMS (which has the previous values) and NEWPARMS (the new ones).

compute t2=%cputime()

compute p=.5

compute sigsq1=lowv,sigsq2=highv

compute oldparms=||p,sigsq1,sigsq2||

do iters=1,500

set pstar = f1=exp(%logdensity(sigsq1,xjpn)),$

f2=exp(%logdensity(sigsq2,xjpn)),$

p*f1/(p*f1+(1-p)*f2)

sstats(mean) rstart rend pstar>>p pstar*xjpn^2>>sumsq1 $

(1-pstar)*xjpn^2>>sumsq2

compute sigsq1=sumsq1/p,sigsq2=sumsq2/(1-p)

compute newparms=||p,sigsq1,sigsq2||

if %testdiff(newparms,oldparms)<1.e-6

break

compute oldparms=newparms

end

?"Estimates from classical EM" p sigsq1 sigsq2

?"Iterations" iters

Finally, this uses MAXIMIZE with the ESTEP option. ESTEP is implemented as a FUNCTION which is called at the start of each iteration to (in this case) produce the PSTAR series which has the expected log likelihood for each data point given the previous values for the parameters.

function EStep

set pstar = f1=exp(%logdensity(sigsq1,xjpn)),$

f2=exp(%logdensity(sigsq2,xjpn)),$

p*f1/(p*f1+(1-p)*f2)

end EStep

compute t3=%cputime()

frml emlogl = pstar*%logdensity(sigsq1,xjpn)+$

(1-pstar)*%logdensity(sigsq2,xjpn)+$

pstar*log(p)+(1-pstar)*log(1-p)

compute p=.5

compute sigsq1=lowv,sigsq2=highv

nonlin p sigsq1 sigsq2

maximize(estep=estep(),method=bhhh,cvcrit=1.e-6,iters=500) emlogl rstart rend

Full Program

open data g10xrate.xls

data(format=xls,org=columns) / usxjpn usxfra usxsui

* This estimates a "mixture" model, with a zero mean and two branches

* for the variance.

set xjpn = 100.0*log(usxjpn/usxjpn{1})

compute rstart=2,rend=%allocend()

* Get guess values based on the sample variance.

stats xjpn

compute lowv=.5*%variance,highv=2.0*%variance

******************************************************************

* Direct maximization

nonlin p sigsq1 sigsq2

frml logl = log(p*exp(%logdensity(sigsq1,xjpn))+$

(1-p)*exp(%logdensity(sigsq2,xjpn)))

compute t1=%cputime()

compute p=.5

compute sigsq1=lowv,sigsq2=highv

maximize logl rstart rend

******************************************************************

* Classic "EM" calculation

compute t2=%cputime()

compute p=.5

compute sigsq1=lowv,sigsq2=highv

compute oldparms=||p,sigsq1,sigsq2||

do iters=1,500

set pstar = f1=exp(%logdensity(sigsq1,xjpn)),$

f2=exp(%logdensity(sigsq2,xjpn)),$

p*f1/(p*f1+(1-p)*f2)

sstats(mean) rstart rend pstar>>p pstar*xjpn^2>>sumsq1 $

(1-pstar)*xjpn^2>>sumsq2

compute sigsq1=sumsq1/p,sigsq2=sumsq2/(1-p)

compute newparms=||p,sigsq1,sigsq2||

if %testdiff(newparms,oldparms)<1.e-6

break

compute oldparms=newparms

end

?"Estimates from classical EM" p sigsq1 sigsq2

?"Iterations" iters

******************************************************************

* EM using the ESTEP option

function EStep

set pstar = f1=exp(%logdensity(sigsq1,xjpn)),$

f2=exp(%logdensity(sigsq2,xjpn)),$

p*f1/(p*f1+(1-p)*f2)

end EStep

compute t3=%cputime()

frml emlogl = pstar*%logdensity(sigsq1,xjpn)+$

(1-pstar)*%logdensity(sigsq2,xjpn)+$

pstar*log(p)+(1-pstar)*log(1-p)

compute p=.5

compute sigsq1=lowv,sigsq2=highv

nonlin p sigsq1 sigsq2

maximize(estep=estep(),method=bhhh,cvcrit=1.e-6,iters=500) emlogl rstart rend

compute t4=%cputime()

?"Timing"

?"ML" @15 t2-t1

?"Classical EM" @15 t3-t2

?"Max(ESTEP)" @15 t4-t3

Output

This does maximum likelihood, plus EM two ways. All three methods come up with the same results, but maximum likelihood, which is easiest to set up, is also the fastest to execute for a model this simple. (Note that the EM "log likelihood" isn't the same as the unconditional log likelihood).

Statistics on Series XJPN

Observations 6236

Sample Mean 0.014351 Variance 0.398893

Standard Error 0.631580 SE of Sample Mean 0.007998

t-Statistic (Mean=0) 1.794334 Signif Level (Mean=0) 0.072808

Skewness 0.784016 Signif Level (Sk=0) 0.000000

Kurtosis (excess) 13.265486 Signif Level (Ku=0) 0.000000

Jarque-Bera 46362.538224 Signif Level (JB=0) 0.000000

MAXIMIZE - Estimation by BFGS

Convergence in 15 Iterations. Final criterion was 0.0000004 <= 0.0000100

Usable Observations 6236

Function Value -5316.1941

Variable Coeff Std Error T-Stat Signif

************************************************************************************

1. P 0.6435210010 0.0252344976 25.50164 0.00000000

2. SIGSQ1 0.0990984332 0.0075979537 13.04278 0.00000000

3. SIGSQ2 0.9404851064 0.0517621213 18.16937 0.00000000

Estimates from classical EM 0.64351 0.09909 0.94046

Iterations 217

MAXIMIZE - Estimation by BHHH

Convergence in 169 Iterations. Final criterion was 0.0000009 <= 0.0000010

Usable Observations 6236

Function Value -8203.9312

Variable Coeff Std Error T-Stat Signif

************************************************************************************

1. P 0.6435127796 0.0135557635 47.47153 0.00000000

2. SIGSQ1 0.0990960843 0.0040509789 24.46226 0.00000000

3. SIGSQ2 0.9404679957 0.0162516182 57.86919 0.00000000

Timing

ML 0

Classical EM 1

Max(ESTEP) 3