K-Means Clustering Algorithm
Given a training set \(S = \{x^{(1)}, x^{(2)}, ... , x^{(m)}\}\), where \(x^{(i)} \in \mathbb{R}^n\), the goal of the k-means clustering algorithm is to group this data into \(k\) cohesive clusters. Note that there are no labels \(y^{(i)}\), and consequently this is said to be an unsupervised learning problem.
The k-Means Clustering Algorithm does the following:
-
Initialize \(k\) cluster centroids \(\mu_1, \mu_2,...,\mu_k \in \mathbb{R}^n\) randomly.
-
Repeat until convergence
{-
For every \(i\), set:
$$c^{(i)} = \underset{j}{argmin} \vert\vert x^{(i)}-\mu_{j}\vert\vert$$ -
For every \(j\), set:
$$\mu_j = \frac{\sum_{i=1}^m 1\{j=c^{(i)}\}x^{(i)}}{\sum_{i=1}^m1\{j=x^{(i)}\}}$$
}
-
Note that we are essentially minimizing the following distortion function:
first with respect to \(c\) while holding \(\mu\) constant and then with respect to \(\mu\) while holding \(c\) constant. Note that is just coordinate descent on \(J\). Note, that while this implies that \(J\) will always monotonically decrease and so must converge to an optimum, the distortion function is non-convex and hence this might be a local optimum rather than the global optimum.