Mixture of Gaussians and the EM Algorithm

Consider a training set $\{x^{(1)},x^{(2)},...,x^{(m)}\}$. We wish to specify a joint distribution $p(x^{(i)},z^{(i)})$, where $z^{(i)} \sim$ $Multinomial$($\phi$), such that $\sum_{i=1}^k \phi_k=1$ and $\phi_k \geqslant 0$ and $x^{(i)} \vert z^{(i)}=j \sim \mathcal{N}(\mu_j,\Sigma_j)$. Note that the $z^{(i)}$’s are not known in advance and hence are the latent variables of this model. We may write the following the maximum likelihood function:

$$ \begin{eqnarray} l(\mu,\Sigma,\phi) &=& \sum_{i=1}^m log\left(p(x^{(i)};\mu,\Sigma,\phi)\right)\\ &=& \sum_{i=1}^m log\left(\sum_{j=1}^kp(x^{(i)} \vert z^{(i)}=j;\mu,\Sigma)p(z^{(i)}=j;\phi)\right) \end{eqnarray} $$

It can be shown that it is not possible to find a closed-form solution for the parameters of this model by setting $\nabla l(\mu,\Sigma,\phi)=0$. Instead, suppose that the $z^{(i)}$’s were known in advance. Then:

$$ \begin{eqnarray} l(\mu,\Sigma,\phi) &=& \sum_{i=1}^m log \left(p(x^{(i)},z^{(i)};\mu,\Sigma,\phi)\right)\\ &=& \sum_{i=1}^m log \left(p(x^{(i)} \vert z^{(i)};\mu,\Sigma)p(z^{(i)};\phi)\right) \end{eqnarray} $$

Maximizing this with respect to the parameters gives:

$$ \begin{eqnarray} \phi_j &=& \frac{1}{m}\sum_{i=1}^m 1\{z^{(i)}=j\}\\ \mu_j &=& \frac{\sum_{i=1}^m 1\{z^{(i)}=j\}x^{(i)}}{\sum_{i=1}^m 1\{z^{(i)}=j\}}\\ \Sigma_j &=& \frac{\sum_{i=1}^m 1\{z^{(i)}=j\}(x^{(i)}-\mu_j)(x^{(i)}-\mu_j)^T}{\sum_{i=1}^m 1\{z^{(i)}=j\}} \end{eqnarray} $$

Note that the derivation for these are similar to that for Gaussian Discriminant Analysis.

The Expectation-Maximization (EM) algorithm is an iterative algorithm that first “guesses” the value of $z^{(i)}$’s (the E-step) and then maximizes the likelihood function with respect to the parameters by using these (soft) guesses for $z^{(i)}$’s (the M-step):

Repeat until convergence
{

(E-step) For each $i,j$ set:
$$ w_j^{(i)} := p(z^{(i)}=j \vert x^{(i)}; \phi,\mu,\Sigma) = \frac{p(x^{(i)} \vert z^{(i)};\mu,\Sigma)p(z^{(i)};\phi)}{\sum_{i=1}^m p(x^{(i)} \vert z^{(i)};\mu,\Sigma)p(z^{(i)};\phi)} $$
(M-step) Update the parameters:

$$ \begin{eqnarray} \phi_j &=& \frac{1}{m}\sum_{i=1}^m w_j^{(i)}\\ \mu_j &=& \frac{\sum_{i=1}^m w_j^{(i)}x^{(i)}}{\sum_{i=1}^m w_j^{(i)}}\\ \Sigma_j &=& \frac{\sum_{i=1}^m w_j^{(i)}(x^{(i)}-\mu_j)(x^{(i)}-\mu_j)^T}{\sum_{i=1}^m w_j^{(i)}} \end{eqnarray} $$

}