0% found this document useful (0 votes)
32 views34 pages

Chap2 Part2 GMM

This document discusses Gaussian mixture models (GMMs). GMMs are a type of machine learning algorithm that can classify data into different categories based on probability distributions. The document explains key concepts such as Gaussian distributions, GMM parameters, the expectation-maximization algorithm, and comparing GMMs to k-means clustering.

Uploaded by

houcem.swissi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views34 pages

Chap2 Part2 GMM

This document discusses Gaussian mixture models (GMMs). GMMs are a type of machine learning algorithm that can classify data into different categories based on probability distributions. The document explains key concepts such as Gaussian distributions, GMM parameters, the expectation-maximization algorithm, and comparing GMMs to k-means clustering.

Uploaded by

houcem.swissi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Chapter 2 : Generative Models

Gaussian Mixture Models


Machine Learning Team
UP GL-BD
Learning Outcomes
• At the end of this chapter, students will be able to:
• Identify a Gaussian distribution

• Define a mixture of Gaussian Distribution

• Enumerate GMM parameters

• Able to identify the model parameters

• Able to use the Expectation-Maximisation algorithm in order to calculate the GMM optimal
parameters

2
Motivation
• Hard clustering vs Soft Clustering

• Hard Clustering: Each point is associated to one and only one Cluster.

• Soft Clustering: Every point belongs to several clusters with certain degrees.

• A limitation of the hard clustering approach is that there is no probability which


can tell us how much a data point is associated with a specific cluster.

• Example: K-means is an example of Hard Clustering

3
Gaussian Distribution
• A Gaussian distribution, or normal distribution, is a type of continuous probability
distribution that is symmetrical about its mean. It has a bell-shaped curve that is
symmetrical from the mean point to both halves of the curve

• Most of the observations are around the

mean, and the further away an observation

is from the mean, the lower its probability

of occurring.

4
Gaussian distribution
• Mathematical definition:
The probability density function related to a continuous random variable x which follows
a normal/Gaussian distribution, is given by :

With : µ= Mean

σ= Standard Variation

5
Gaussian Distribution

• 91,4% of the data are between [µ-2σ] and [µ+2σ]

6
Gaussian Distribution

7
Gaussian distribution
• For d-dimensions, the Gaussian distribution of a vector x

is defined by:

Where µ is the mean and Σ is the covariance matrix

8
Gaussian Distribution

9
Gaussian Mixture Models (GMM)
• Gaussian mixture models (GMMs) are a type of machine learning algorithm.

• They are used to classify data into different categories based on the probability
distribution.

• Gaussian mixture models can be

used in many different areas,

including finance, marketing,

computer vision and so much more.

10
Gaussian Mixture Model (GMM)
• Definition:
• A Gaussian Mixture is a function that is comprised of several Gaussians, each
identified by k {1,…, K}, where K is the number of clusters of our dataset.
Each Gaussian k in the mixture is comprised of the following parameters:
• A mean μ that defines its centre.
• A covariance Σ .
• A mixing probability that defines how big or small the Gaussian function
will be.
11
Gaussian Mixture Model (GMM)
• The probability given in a mixture models of K Gaussians is:

where is the prior probability (weight) of the jth Gaussian


(component)

12
Gaussian Mixture Model (GMM)
• Example of Gaussian Mixture model of two components (2 Gaussian
Distributions)

13
Gaussian Mixture Model (GMM)
• Example: (a) & (b) presents each one a 2-dimensions Gaussian Mixture model

14
Parameters Estimation
• Problem:
Given a set of data X={x1, x2, …, xN} following probably a GMM distribution,
estimate the parameters θ of the GMM that fits the data.
with θ={ µ, Σ, ѡ } for each component of the mixture.

• Solution: Use the technique of Maximum Likelihood


Maximise the likelihood p(X|θ) of the data with regard to the model
parameters

15
Maximum Likelihood Estimation (MLE)
• The Maximum Likelihood Estimation is a frequentist approach used to estimate
the optimal parameters related to a mixture model.

• Given a sample of dataset, the Maximum likelihood estimator calculates the best
values of the mixture model.

• In practise, a likelihood function is created based on the probability density


function.

• It is required to maximise the log related to the likelihood function.

16
Expectation Maximisation (EM) Algorithm
• MLE is a frequentist principle that suggests that given a dataset, the “best” parameters
to use are the ones that maximise the probability of the data

• MLE is a way to formally pose the problem

• Expectation-Maximisation (EM) is a way to solve the problem posed by MLE

• The Expectation-Maximisation (EM) Algorithm is an iterative method used to maximise


the log likelihood.

• It is widely used for optimisation problems especially when the objective function has
complexities such as maximising the log likelihood.

17
Expectation Maximisation (EM) Algorithm

• Expectation (E) step: Given the current


parameters of the model, estimate a
probability distribution.

• Maximization (M) step: Given the current


data, estimate the parameters to update
the model.

18
EM more in depth
• Expectation (E) step: Using the current
estimate for the parameters, create
function for the expectation of the log-
likelihood.

• Maximization (M) step: Computes


parameters maximizing the expected log-
likelihood found on the E step.

19
EM more in depth
• Problem:

θ={ µ, Σ, ѡ } model parameters

X={x1, x2, …, xN} Observed data

zik hidden (latent) variable, related to xi, zik=1 if xi belongs to the k component of the mixture
• EM Steps:
1. Initialize the parameters θ of the model
2. E Step: Find the posterior probabilities of the latent variable Z given current parameter θ
3. M Step: Re-estimate the parameter values given the current posterior probabilities. Use the
computed values of Z to re-estimate θ
4. Iterate steps 2 and 3 until convergence
20
EM more in depth
• Example:
Hidden variable : for each
point which gaussian
(component) generate it ?

21
EM more in depth
• E Step
For each point estimate the
probability that each Gaussian
generated it

22
EM more in depth
• M Step
For each point estimate the
probability that each Gaussian
generated it

23
K-means vs GMM
K-means GMM
• Objective function • Objective function

– Minimize sum of squared Euclidean distance – Maximize log-likelihood

• Can be optimized by an EM algorithm • EM algorithm

– E-step: assign points to clusters – E-step: Compute posterior probability of membership

– M-step: optimize clusters – M-step: Optimize parameters

– Performs hard assignment during E- step – Perform soft assignment during E-step

• Assumes spherical clusters with equal probability of • Can be used for non-spherical clusters
a cluster
• Can generate clusters with different probabilities
24
Gaussian Mixture Models (GMMs)
• Strengths
– Give probabilistic cluster assignments
– Have probabilistic interpretation
– Can handle clusters with varying sizes, variance etc.
• Weakness
– Initialization matters
– Choose appropriate distributions
– Overfitting issues

25
Appendix
Expectation-Maximisation
Expectation-Maximisation
• Let’s suppose we want to know what is the probability that a data point xn comes from
Gaussian k. We can express this as:
• “given a data point x, what is the probability it came from Gaussian k?”
• In this case, z is a latent (hidden, unknown variable that takes only two possible values. It is
one when x came from Gaussian k, and zero otherwise.
• Knowing the probability of occurrence of z will be useful in helping us determine the
Gaussian mixture parameters.
• Likewise, we can state the following:
• Which means that the overall probability of observing a point that comes from Gaussian k is
actually equivalent to the mixing coefficient for that Gaussian.
• Now let z be the set of all possible latent variables z, hence:

27
Expectation-Maximisation
• Each z occurs independently of others and that they can only take the value of one when
k is equal to the cluster the point comes from. Therefore:

• Now, what about finding the probability of observing our data given that it came from
Gaussian k? Turns out to be that it is actually the Gaussian function itself! Following the
same logic we used to define p(z), we can state:

• The aim is to determine what the probability of z given our observation x? Well, it turns
out to be that the equations we have just derived, along with the Bayes rule, will help us
determine this probability. From the product rule of probabilities, we know that

28
Expectation-Maximisation
• We just need to sum up the terms on z to get p(xn), not p(xn, z)

• This is the equation that defines a Gaussian Mixture. To determine the optimal values
for the parameters we need to determine the maximum likelihood of the model. We
can find the likelihood as the joint probability of all observations xn, defined by:

• Let’s apply the log to each side of the equation:

29
Expectation-Maximisation
• Now, remember that our aim is to find the probability of z given x.
• From Bayes rule, we know that

• Also, we have:

• So let’s now replace these in the previous equation:

Eq1

30
Expectation-Maximisation
• Let us now define the steps that the general EM algorithm will follow.
• Step 1: Initialise θ accordingly. For instance, we can use the results obtained by a
previous K-Means run as a good starting point for our algorithm.
• Step 2 (Expectation step): Evaluate
• Eq2
• The expectation step consists on calculating the value of ϒ, so if we replace Eq1 in Eq2,
we get:
Eq3

• We have the following complete likelihood:

31
Expectation-Maximisation
• The log of this expression is given by

Eq4
• Now, we replace Eq4 in Eq3:
Eq5

• Step 3 (Maximization step): Find the revised parameters θ* using:

• Note that Eq5 can be re-written as following:

32
Expectation-Maximisation
• And now we can easily determine the parameters by using maximum likelihood. Let’s now
take the derivative of Q with respect to π and set it equal to zero:

• By rearranging the terms and applying a summation over k to both sides of the equation,
we obtain:

• we know that the summation of all mixing coefficients π equals one. In addition, we know
that summing up the probabilities γ over k will also give us 1. Thus we get λ = N. Using this
result, we can solve for π:

33
Expectation-Maximisation
• Similarly, if we differentiate Q with respect to μ and Σ, equate the derivative to zero and
then solve for the parameters by using the log-likelihood equation, we obtain:

• Then we will use these revised values to determine ϒ in the next EM iteration and so on
and so forth until we see some convergence in the likelihood value

34

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy