Gaussian Mixture Model GMM
Gaussian Mixture Model GMM
1. What is a GMM?
A GMM assumes that the underlying data is generated from a mixture of several
Gaussian distributions.
Each Gaussian component has its own mean and variance, representing a cluster in
the data.
The GMM uses a mixture of these Gaussian distributions to model the overall data
distribution.
The parameters of the GMM (means, variances, and mixture weights) are
estimated using the EM algorithm, an iterative process that refines the model's fit
to the data.
Soft Clustering:
Unlike hard clustering (e.g., K-means), GMM assigns each data point a probability
of belonging to each cluster. This allows for overlapping or ambiguous cluster
boundaries.
Density Estimation:
GMMs can estimate the probability density of data points, which is useful for tasks
like anomaly detection (identifying data points that are unlikely to belong to any of
the clusters).
Flexibility:
GMMs can model complex data distributions that are not easily captured by
simpler models.
GMMs can handle data points that fall between cluster boundaries, unlike hard
clustering algorithms.
Probabilistic Approach:
identified by k ∈ {1,…, K}, where K is the number of clusters of our data set. Each
A Gaussian mixture is a function that is composed of several Gaussians, each
Here, we can see that there are three Gaussian functions, hence K = 3. Each
Gaussian explains the data contained in each of the three clusters available. The
mixing coefficients are themselves probabilities and must meet this condition:
How do we determine the optimal values for these parameters? To achieve this we
must ensure that each Gaussian fits the data points belonging to each cluster. This
is exactly what maximum likelihood does.
Where x represents our data points, D is the number of dimensions of each data
point. μ and Σ are the mean and covariance, respectively. If we have a data set
composed of N = 1000 three-dimensional points (D = 3), then x will be a 1000 × 3
matrix. μ will be a 1 × 3 vector, and Σ will be a 3 × 3 matrix. For later purposes,
we will also find it useful to take the log of this equation, which is given by:
If we differentiate this equation with respect to the mean and covariance and then
equate it to zero, then we will be able to find the optimal values for these
parameters, and the solutions will correspond to the maximum likelihood
estimation (MLE) for this setting.
However, because we are dealing with not just one, but many Gaussians, things
will get a bit complicated when time comes for us to find the parameters for the
whole mixture. In this regard, we will need to introduce some additional aspects
that we discuss in the formulas section.
Applications:
Limitations of GMMs:
Parameter Estimation:
The EM algorithm can sometimes get stuck in local optima, requiring careful
initialization.
Choice of Number of Components:
Computational Cost: