RATS 10.1
RATS 10.1

EMEXAMPLE.RPF demonstrates the EM optimization algorithm, applied to a "mixture" model.

 

The data are the returns on the US/Japan exchange rate. The example model is a two-regime mixture model with mean zero Normals with different variances in the two regimes. The simple mixture model here is unrealistic as it treats the observations as independent—MSVARIANCES.RPF does a similar model (with direct maximum likelihood rather than EM) with a Markov Switching model governing the regime. However, we are using this to demonstrate how EM works.

 

The program does the optimization three ways:

1.Direct maximization

2."Classical" EM, looping over the E and M steps.

3.MAXIMIZE with the ESTEP option to do handle the E step at each iteration.

 

There are three parameters to be estimated: the two variances and the probability of being in regime 1. We will aim to make regime 1 the low variance regime, though the labeling is arbitrary.

 

Estimate will run from period 2 (we lose one to the calculation of returns from raw exchange rates) until the end of the data set.

 

set xjpn = 100.0*log(usxjpn/usxjpn{1})

compute rstart=2,rend=%allocend()


We get guess values for the two regimes from the sample variance. We want these to be spread out enough that it is unlikely for the regimes to switch "labels" during the estimation process.


stats xjpn
compute lowv=.5*%variance,highv=2.0*%variance
 

This does the direct maximization. We save the CPU time so we can compare the different algorithms at the end. Note that because the probability weights are applied to the likelihoods (not log likelihoods) of the two regimes, this does exp(%logdensity(....)) inside the parentheses to get the likelihood, then takes the log outside to produce the overall log likelihood.

 

nonlin p sigsq1 sigsq2

frml logl = log(p*exp(%logdensity(sigsq1,xjpn))+$

            (1-p)*exp(%logdensity(sigsq2,xjpn)))

*

compute t1=%cputime()

compute p=.5

compute sigsq1=lowv,sigsq2=highv

maximize logl rstart rend

 

 

This does the classical EM algorithm. The latent data is the regime \(x_t\) for each data point. The E step gives us the probability-weighted log likelihood using the previous estimates for the three parameters. Maximizing that to get new parameters gives us a probability-weighted estimate for the variances, and an average probability across observations to estimate \(p\). One drawback is that, if you want to iterate this to convergence, you have to do your own convergence checks, done here using the %TESTDIFF function applied to OLDPARMS (which has the previous values) and NEWPARMS (the new ones). 

 

compute t2=%cputime()

compute p=.5

compute sigsq1=lowv,sigsq2=highv

compute oldparms=||p,sigsq1,sigsq2||

do iters=1,500

   set pstar = f1=exp(%logdensity(sigsq1,xjpn)),$

               f2=exp(%logdensity(sigsq2,xjpn)),$

               p*f1/(p*f1+(1-p)*f2)

   sstats(mean) rstart rend pstar>>p pstar*xjpn^2>>sumsq1 $

      (1-pstar)*xjpn^2>>sumsq2

   compute sigsq1=sumsq1/p,sigsq2=sumsq2/(1-p)

   compute newparms=||p,sigsq1,sigsq2||

   if %testdiff(newparms,oldparms)<1.e-6

      break

   compute oldparms=newparms

end

?"Estimates from classical EM" p sigsq1 sigsq2

?"Iterations" iters

 

 

Finally, this uses MAXIMIZE with the ESTEP option. ESTEP is implemented as a FUNCTION which is called at the start of each iteration to (in this case) produce the PSTAR series which has the expected log likelihood for each data point given the previous values for the parameters.

 

function EStep

set pstar = f1=exp(%logdensity(sigsq1,xjpn)),$

            f2=exp(%logdensity(sigsq2,xjpn)),$

            p*f1/(p*f1+(1-p)*f2)

end EStep

*

compute t3=%cputime()

frml emlogl = pstar*%logdensity(sigsq1,xjpn)+$

          (1-pstar)*%logdensity(sigsq2,xjpn)+$

            pstar*log(p)+(1-pstar)*log(1-p)

*

compute p=.5

compute sigsq1=lowv,sigsq2=highv

nonlin p sigsq1 sigsq2

maximize(estep=estep(),method=bhhh,cvcrit=1.e-6,iters=500) emlogl rstart rend

 


Full Program
 

open data g10xrate.xls

data(format=xls,org=columns) / usxjpn usxfra usxsui

*

* This estimates a "mixture" model, with a zero mean and two branches

* for the variance.

*

set xjpn = 100.0*log(usxjpn/usxjpn{1})

compute rstart=2,rend=%allocend()

*

* Get guess values based on the sample variance.

*

stats xjpn

compute lowv=.5*%variance,highv=2.0*%variance

******************************************************************

*

* Direct maximization

*

nonlin p sigsq1 sigsq2

frml logl = log(p*exp(%logdensity(sigsq1,xjpn))+$

            (1-p)*exp(%logdensity(sigsq2,xjpn)))

*

compute t1=%cputime()

compute p=.5

compute sigsq1=lowv,sigsq2=highv

maximize logl rstart rend

******************************************************************

*

* Classic "EM" calculation

*

compute t2=%cputime()

compute p=.5

compute sigsq1=lowv,sigsq2=highv

compute oldparms=||p,sigsq1,sigsq2||

do iters=1,500

   set pstar = f1=exp(%logdensity(sigsq1,xjpn)),$

               f2=exp(%logdensity(sigsq2,xjpn)),$

               p*f1/(p*f1+(1-p)*f2)

   sstats(mean) rstart rend pstar>>p pstar*xjpn^2>>sumsq1 $

      (1-pstar)*xjpn^2>>sumsq2

   compute sigsq1=sumsq1/p,sigsq2=sumsq2/(1-p)

   compute newparms=||p,sigsq1,sigsq2||

   if %testdiff(newparms,oldparms)<1.e-6

      break

   compute oldparms=newparms

end

?"Estimates from classical EM" p sigsq1 sigsq2

?"Iterations" iters

******************************************************************

*

* EM using the ESTEP option

*

function EStep

set pstar = f1=exp(%logdensity(sigsq1,xjpn)),$

            f2=exp(%logdensity(sigsq2,xjpn)),$

            p*f1/(p*f1+(1-p)*f2)

end EStep

*

compute t3=%cputime()

frml emlogl = pstar*%logdensity(sigsq1,xjpn)+$

          (1-pstar)*%logdensity(sigsq2,xjpn)+$

            pstar*log(p)+(1-pstar)*log(1-p)

*

compute p=.5

compute sigsq1=lowv,sigsq2=highv

nonlin p sigsq1 sigsq2

maximize(estep=estep(),method=bhhh,cvcrit=1.e-6,iters=500) emlogl rstart rend

*

compute t4=%cputime()

*

?"Timing"

?"ML" @15 t2-t1

?"Classical EM" @15 t3-t2

?"Max(ESTEP)" @15 t4-t3

 

Output

This does maximum likelihood, plus EM two ways. All three methods come up with the same results, but maximum likelihood, which is easiest to set up, is also the fastest to execute for a model this simple. (Note that the EM "log likelihood" isn't the same as the unconditional log likelihood).
 

Statistics on Series XJPN

Observations                  6236

Sample Mean               0.014351      Variance                   0.398893

Standard Error            0.631580      SE of Sample Mean          0.007998

t-Statistic (Mean=0)      1.794334      Signif Level (Mean=0)      0.072808

Skewness                  0.784016      Signif Level (Sk=0)        0.000000

Kurtosis (excess)        13.265486      Signif Level (Ku=0)        0.000000

Jarque-Bera           46362.538224      Signif Level (JB=0)        0.000000

 

 

MAXIMIZE - Estimation by BFGS

Convergence in    15 Iterations. Final criterion was  0.0000004 <=  0.0000100

 

Usable Observations                      6236

Function Value                     -5316.1941

 

    Variable                        Coeff      Std Error      T-Stat      Signif

************************************************************************************

1.  P                            0.6435210010 0.0252344976     25.50164  0.00000000

2.  SIGSQ1                       0.0990984332 0.0075979537     13.04278  0.00000000

3.  SIGSQ2                       0.9404851064 0.0517621213     18.16937  0.00000000

 

Estimates from classical EM       0.64351       0.09909       0.94046

Iterations 217

 

MAXIMIZE - Estimation by BHHH

Convergence in   169 Iterations. Final criterion was  0.0000009 <=  0.0000010

 

Usable Observations                      6236

Function Value                     -8203.9312

 

    Variable                        Coeff      Std Error      T-Stat      Signif

************************************************************************************

1.  P                            0.6435127796 0.0135557635     47.47153  0.00000000

2.  SIGSQ1                       0.0990960843 0.0040509789     24.46226  0.00000000

3.  SIGSQ2                       0.9404679957 0.0162516182     57.86919  0.00000000

 

Timing

ML            0

Classical EM  1

Max(ESTEP)    3

 


Copyright © 2025 Thomas A. Doan