Mixture of Gaussians and the EM Algorithm
Consider a training set \(\{x^{(1)},x^{(2)},...,x^{(m)}\}\). We wish to specify a joint distribution \(p(x^{(i)},z^{(i)})\), where \(z^{(i)} \sim\) \(Multinomial\)(\(\phi\)), such that \(\sum_{i=1}^k \phi_k=1\) and \(\phi_k \geqslant 0\) and \(x^{(i)} \vert z^{(i)}=j \sim \mathcal{N}(\mu_j,\Sigma_j)\). Note that the \(z^{(i)}\)’s are not known in advance and hence are the latent variables of this model. We may write the following the maximum likelihood function:
It can be shown that it is not possible to find a closed-form solution for the parameters of this model by setting \(\nabla l(\mu,\Sigma,\phi)=0\). Instead, suppose that the \(z^{(i)}\)’s were known in advance. Then:
Maximizing this with respect to the parameters gives:
Note that the derivation for these are similar to that for Gaussian Discriminant Analysis.
The Expectation-Maximization (EM) algorithm is an iterative algorithm that first “guesses” the value of \(z^{(i)}\)’s (the E-step) and then maximizes the likelihood function with respect to the parameters by using these (soft) guesses for \(z^{(i)}\)’s (the M-step):
Repeat until convergence
{
-
(E-step) For each \(i,j\) set:
$$ w_j^{(i)} := p(z^{(i)}=j \vert x^{(i)}; \phi,\mu,\Sigma) = \frac{p(x^{(i)} \vert z^{(i)};\mu,\Sigma)p(z^{(i)};\phi)}{\sum_{i=1}^m p(x^{(i)} \vert z^{(i)};\mu,\Sigma)p(z^{(i)};\phi)} $$ -
(M-step) Update the parameters:
}