0% found this document useful (0 votes)

5 views51 pages

Lecture 06

Notes

Uploaded by

bhavanid07092002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views51 pages

Lecture 06

Notes

Uploaded by

bhavanid07092002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

Model-based clustering

Gaussian mixture models

Erik-Jan van Kesteren & Daniel L. Oberski

Last week
• Hierarchical clustering
• K-means clustering
• Assessing cluster solutions
• Stability
• Internal metrics
• External validation
Today
• Model-based clustering
• Maximum likelihood estimation
• EM algorithm
• Multivariate model-based clustering
• Assumptions & restrictions

• Goal: understand, apply, and assess model-based

clustering methods
Reading materials
• Mixture models: latent profile and
latent class analysis (Oberski, 2016)
http://daob.nl/wp-
content/papercite-
data/pdf/oberski2016mixturemode
ls.pdf
• MBCC sections 2.1 and 2.2
Model-based clustering
K-means again
1. Assign examples to 𝐾 clusters

2. a. Calculate K cluster
centroids;

b. Assign examples to cluster

with closest centroid;

3. If assignments changed, back

to step 2a; else stop.
K-means again
• K-means is based on a rule
• Why this rule and not some other rule?
• What kind of data does the rule work well for?
• In what situations would the rule fail?
• What happens if we want to change the rule?

All difficult to answer by staring at the algorithm.

K-means again
• k-means algorithm makes
clusters which are circular in
the space of the data.
• Is this reasonable?
• Maybe x and y covary within
the clusters, in the same way
or even differently?
• Maybe we need ellipses?
Model-based clustering
Steps:
1. Pretend we believe in some statistical model that describes
data as belonging to unobserved (“latent”) groups;
2. Estimate (“train”) this model using the data.

The rule follows from the model!

• Instead of worrying about algorithm, we worry about model.
• Earlier mentioned questions are easier to answer.
Model-based clustering
• Assumptions about the clusters are explicit, not implicit.
• We will look at the most used family of models:

Gaussian mixture models (GMMs)

• Data within each cluster (multivariate) normally distributed.
• Parameters can be either the same or different across groups:
• Volume (size of the clusters in data space);
• Shape (circle or ellipse);
• Orientation (the angle of the ellipse).
Model-based clustering
Another major advantage
• For each observation, get a posterior probability of
belonging to each cluster
• Reflects that cluster membership is uncertain
• Cluster assignment can be done based on the highest
probability cluster for each observation
Model-based clustering
Remember silhouette?
• 𝑎𝑖 = avg. distance to
fellow cluster
members (cohesion)
• 𝑏𝑖 = min. distance to
member from
different cluster
(separation)
𝑏𝑖 − 𝑎𝑖
𝑠𝑖 =
max 𝑎𝑖 , 𝑏𝑖 Introduction to
data mining
Model-based clustering
Specific examples of model-based clustering:
• Gaussian mixture models
• Latent profile analysis
• Latent class analysis (categorical observations)
• Latent Dirichlet allocation
Gaussian mixture modelling
Model-based clustering
• Statistical model + assumptions defines a likelihood:

𝑝 𝑑𝑎𝑡𝑎 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠) = 𝑝 𝑦 𝜃)

• Maximum likelihood estimation: find the parameters 𝜃 for which

it is most likely to observe this data
• This is how models can be estimated / fit / trained

• NB: the model and its assumptions are debatable!

Model-based clustering
Likelihood (density) for height data:

𝑝 ℎ𝑒𝑖𝑔ℎ𝑡 𝜃) =
𝑃𝑟 𝑚𝑎𝑛 𝑁𝑜𝑟𝑚𝑎𝑙 𝜇𝑚𝑎𝑛 , 𝜎𝑚𝑎𝑛 +
𝑃𝑟(𝑤𝑜𝑚𝑎𝑛)𝑁𝑜𝑟𝑚𝑎𝑙(𝜇𝑤𝑜𝑚𝑎𝑛 , 𝜎𝑤𝑜𝑚𝑎𝑛 )

Or, in clearer notation:

𝑝 ℎ𝑒𝑖𝑔ℎ𝑡 𝜃) =
𝜋1𝑋 𝑁𝑜𝑟𝑚𝑎𝑙 𝜇1 , 𝜎1 +
(1 − 𝜋1𝑋 )𝑁𝑜𝑟𝑚𝑎𝑙 𝜇2 , 𝜎2
Model-based clustering
Gaussian mixture parameters:
• 𝜋1𝑋 determines the relative cluster sizes
• Proportion of observations to be expected in each cluster
• 𝜇1 and 𝜇2 determine the locations of the clusters
• Like centroids in k-means clustering
• 𝜎1 and 𝜎2 determine the volume of the clusters
• how large / spread out the clusters are in data space

Together, these 5 unknown parameters describe our model of how the

data is generated.
Estimation: the EM algorithm
If we know who is a man and who is a woman, it’s easy to find
the maximum likelihood estimates for 𝜇 and 𝜎:
σ𝑁1
𝑖=1 ℎ𝑒𝑖𝑔ℎ𝑡𝑖
σ𝑁1
𝑖=1 ℎ𝑒𝑖𝑔ℎ𝑡𝑖 − 𝜇Ƹ 1
2
𝜇Ƹ 1 = , 𝜎ො1 =
𝑁1 𝑁1 − 1

(and same for 𝜇ො2 and 𝜎ො2 )

But we don’t know this!

-> Assignments need to be estimated too.
Estimation: the EM algorithm
• Solution: Figure out the posterior probability of being a
man/woman, given the current estimates of the means and
sds
• If we know cluster locations and shapes,
how likely is it that a 1.7m person is
a man or a woman?

𝑋 2.20
𝜋𝑚𝑎𝑛 = ≈ 0.77
2.86
Estimation: the EM algorithm
• Now we have some class assignments (probabilities);
• So we can go back to the parameters and update them using
our easy rule (M-step)
• Then, we can compute new posterior probabilities (E-step)

Does it remind you of something…?

Estimation: the EM algorithm
Live coding EM
Break
Multivariate model-based
clustering
Multivariate model-based
clustering
• With 2 observed features:
• mean becomes a vector of 2 means
• standard deviation turns into a 2x2 variance-covariance matrix
determining the shape of the cluster
• So we have multiple within-cluster parameters:
• Two means
• Two variances, one for each observed variable
• A single covariance among the features
• Together, the 11 parameters define the likelihood in bivariate
space, which from the top looks like ellipses
Multivariate normal distribution

𝑁𝑜𝑟𝑚𝑎𝑙 𝑥; 𝜇, 𝜎 =

𝑀𝑉𝑁 𝑥; 𝜇, 𝜎 =
Multivariate model-based
clustering
𝑝 𝒚 𝜃) = 𝜋1𝑋 𝑀𝑉𝑁 𝝁𝟏 , 𝚺𝟏 + 1 − 𝜋1𝑋 𝑀𝑉𝑁 𝝁𝟐 , 𝚺𝟐
Estimation: the EM algorithm
Multivariate model-based
clustering
• Cluster shape parameters (the variance-covariance matrix)
can be constrained to be equal across clusters
• Same as k-means
• Can also be different across clusters
• not possible in k-means
• More flexible, complex model
• Think about the bias-variance tradeoff!
TOP SECRET SLIDE
• K-means clustering is a GMM with the following model:
• All prior class proportions are 1/K
• EII model: equal volume, only circles
• All posterior probabilities are either 0 or 1
TOP SECRET SLIDE 2
• GMM has trouble with clusters that are not ellipses
• Secret weapon: merging

Powerful idea:
• Start with Gaussian mixture solution
• Merge “similar” components to create non-Gaussian clusters

NB: we’re distinguishing “components” from “clusters” now

Merging

library(mclust)
out <- Mclust(x)
com <-
clustCombi(out)

plot(com)
Assessing clustering results
Methods to assess whether the obtained clusters are “good”:
• Stability (previous lecture)
• External validity (previous lecture)
• Model fit
Model fit
How well does the model fit to the data?
Log-likelihood
𝑁 𝑁

ℓ 𝜃 = log 𝑝(𝑦|𝜃) = log ෑ 𝑝 𝑦𝑛 𝜃 = ෍ log 𝑝(𝑦𝑛 |𝜃)

𝑛=1 𝑛=1
The higher the log-likelihood, the more likely the data (if we
assume this model is correct)
Deviance
−2 ⋅ ℓ(𝜃) (lower deviance is better)
Information criteria
Deviance forms the basis of information criteria, which balance
fit and complexity

Akaike information criterion

𝐴𝐼𝐶 = −2ℓ 𝜃 + 2𝑘
(where k is the number of parameters)

Bayesian information criterion

𝐵𝐼𝐶 = −2ℓ 𝜃 + 𝑘 log 𝑛
(where n is the number of rows in your data)
Information criteria
Think: bias and variance tradeoff!
• Variance also has to do with stability

Better fit & lower complexity = better cluster solution

(other assessment methods also available for model-based

clustering)
High-dim!
How to do GMM in high dimensions?
• Same solution as we are used to by now!
• Perform clustering on dimension reduction version of original data
• Integrate regularization / dimension reduction in your GMM
optimization method

• Bouveyron et al. (2007) High-dimensional data clustering;

Computational Statistics & Data Analysis 52, 502 – 519
• The second solution
• Akin to “mixtures of probabilistic PCA”
Model-based clustering in R
Model-based clustering in R
• Mclust implements multivariate model-based clustering
• Provides an easy interface to fit several parameterizations
• Model comparison with BIC
• Plotting functionality
Model-based clustering in R
• Mclust uses an identifier for each possible parametrization of
the cluster shape: E for equal, V for variable in:
• Volume (size of the clusters in data space)
• Shape (circle or ellipse)
• Orientation (the angle of the ellipse)
• So an EEE model has equal volume, shape and orientation
• A VVV model has variable volume, shape, and orientation
• A VVE model has variable volume and shape but equal
orientation
Model-based clustering in R
Model-based clustering in R
VVV, 3 clusters

• How Mclust optimizes

hyperparameters:
• Fit all the models with up to
9 clusters (or more, your
choice!)
• Compute the BIC of each
model
• Choose the model with the
lowest BIC
Practical: perform model-based
clustering
Take-home exercises: 1-11
Questions?

Im in Love With The Villainess Shes So Cheeky For A Commoner Light Novel Vol 1 Inori Instant Download
No ratings yet
Im in Love With The Villainess Shes So Cheeky For A Commoner Light Novel Vol 1 Inori Instant Download
82 pages
BigData Clustering
No ratings yet
BigData Clustering
67 pages
Unit 5
No ratings yet
Unit 5
40 pages
Week 5 v1.1 - Unsupervised Learning
No ratings yet
Week 5 v1.1 - Unsupervised Learning
40 pages
01 Clustering
No ratings yet
01 Clustering
43 pages
ML Unit3
No ratings yet
ML Unit3
21 pages
CSC454 9
No ratings yet
CSC454 9
29 pages
EM and Kmeans Relations
No ratings yet
EM and Kmeans Relations
70 pages
Lecture Expectation Maximization
No ratings yet
Lecture Expectation Maximization
58 pages
Unsupervised Learning-01
No ratings yet
Unsupervised Learning-01
42 pages
Image Segmentation1
No ratings yet
Image Segmentation1
42 pages
Data Communication & Networks: Unit 1
No ratings yet
Data Communication & Networks: Unit 1
17 pages
Outdoor Integrated Cabinet ICC710-HA1-C7
No ratings yet
Outdoor Integrated Cabinet ICC710-HA1-C7
2 pages
M SC Botany Syllabus&Scheme
No ratings yet
M SC Botany Syllabus&Scheme
34 pages
Lect 10 - Unsupervised Learning
No ratings yet
Lect 10 - Unsupervised Learning
50 pages
Distance Time Graphs
No ratings yet
Distance Time Graphs
12 pages
Outline: Three Basic Algorithms
No ratings yet
Outline: Three Basic Algorithms
34 pages
ML.5-Clustering Techniques (Week 9)
No ratings yet
ML.5-Clustering Techniques (Week 9)
71 pages
Clustering Mixed Data
No ratings yet
Clustering Mixed Data
10 pages
Forensic Ballistics Questions 1
No ratings yet
Forensic Ballistics Questions 1
54 pages
Latent Clustering W Mplus v2
No ratings yet
Latent Clustering W Mplus v2
57 pages
Model-Based Clustering
No ratings yet
Model-Based Clustering
23 pages
ZAMANI (1 To End)
No ratings yet
ZAMANI (1 To End)
122 pages
ASSIGNMENT1
No ratings yet
ASSIGNMENT1
7 pages
401 Week7 Part 2 EM Algorithm
No ratings yet
401 Week7 Part 2 EM Algorithm
58 pages
Clustering Mixture
No ratings yet
Clustering Mixture
22 pages
Week 10
No ratings yet
Week 10
50 pages
ML (Exp 10) Yuti
No ratings yet
ML (Exp 10) Yuti
9 pages
Probability and Statistics Mansoura Day4
No ratings yet
Probability and Statistics Mansoura Day4
23 pages
Gaussian Mixture Model - GeeksforGeeks
No ratings yet
Gaussian Mixture Model - GeeksforGeeks
6 pages
Clustering, K-Means,. Expectation Maximization, Mean Shift, Classifier Ensembles, Bagging, Boosting
No ratings yet
Clustering, K-Means,. Expectation Maximization, Mean Shift, Classifier Ensembles, Bagging, Boosting
21 pages
Clustering, K-Means,. Expectation Maximization, Mean Shift, Classifier Ensembles, Bagging, Boosting
No ratings yet
Clustering, K-Means,. Expectation Maximization, Mean Shift, Classifier Ensembles, Bagging, Boosting
21 pages
GaussianMixtureModel (GMM)
No ratings yet
GaussianMixtureModel (GMM)
18 pages
PROBABILISTIC Learning Jb-New
No ratings yet
PROBABILISTIC Learning Jb-New
13 pages
Gardner Denver Air Compressor Parts Catalog
No ratings yet
Gardner Denver Air Compressor Parts Catalog
113 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
64 pages
Unit 2 - SVM
No ratings yet
Unit 2 - SVM
137 pages
ML Lecture06 Unsupervised Learning
No ratings yet
ML Lecture06 Unsupervised Learning
87 pages
Module - 5 - ECE3047 - Machine Learning
No ratings yet
Module - 5 - ECE3047 - Machine Learning
52 pages
Lec. 15-Final. ClusAdvanced
No ratings yet
Lec. 15-Final. ClusAdvanced
103 pages
Concepts and Techniques: - Chapter 11
No ratings yet
Concepts and Techniques: - Chapter 11
103 pages
Machine Learning & Data Mining
No ratings yet
Machine Learning & Data Mining
108 pages
Pattern Analysis-Machine Learning
No ratings yet
Pattern Analysis-Machine Learning
74 pages
EML %TH Module
No ratings yet
EML %TH Module
40 pages
MLT Lab 08
No ratings yet
MLT Lab 08
5 pages
Iris MBC Solution
No ratings yet
Iris MBC Solution
6 pages
Gaussian Mixture Mode
No ratings yet
Gaussian Mixture Mode
30 pages
DSA5102 Lecture10
No ratings yet
DSA5102 Lecture10
40 pages
Statistical Methods For NLP: Document and Topic Clustering, K-Means, Mixture Models, Expectation-Maximization
No ratings yet
Statistical Methods For NLP: Document and Topic Clustering, K-Means, Mixture Models, Expectation-Maximization
47 pages
Discovery of The Atom'
No ratings yet
Discovery of The Atom'
12 pages
Unit 3 Updated Notes
No ratings yet
Unit 3 Updated Notes
29 pages
annotated-Blackrock-Group 9
No ratings yet
annotated-Blackrock-Group 9
7 pages
Gaussian Mixture Modelling GMM
No ratings yet
Gaussian Mixture Modelling GMM
11 pages
Expectation Maximization
No ratings yet
Expectation Maximization
23 pages
Micron: Pawe Kozikowski
No ratings yet
Micron: Pawe Kozikowski
6 pages
Symmetrical Based Projects
No ratings yet
Symmetrical Based Projects
105 pages
Week3 Statnlp Web
No ratings yet
Week3 Statnlp Web
58 pages
Tulip Inn Hotel
No ratings yet
Tulip Inn Hotel
8 pages
Roblox Drink - Google Search
No ratings yet
Roblox Drink - Google Search
1 page
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
No ratings yet
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
65 pages
Reactive Power Compensation Technologies, State-of-the-Art Review
No ratings yet
Reactive Power Compensation Technologies, State-of-the-Art Review
21 pages
Bacterial Food Intoxication Lecture # 6
No ratings yet
Bacterial Food Intoxication Lecture # 6
17 pages
Unit 5
No ratings yet
Unit 5
5 pages
January 2015 (IAL) QP - Unit 4 Edexcel Biology A-Level
No ratings yet
January 2015 (IAL) QP - Unit 4 Edexcel Biology A-Level
24 pages
CLUSTER: An Unsupervised Algorithm For Modeling Gaussian Mixtures
No ratings yet
CLUSTER: An Unsupervised Algorithm For Modeling Gaussian Mixtures
20 pages
Italian Wedding Soup With Turkey Meatballs Recipe 2
No ratings yet
Italian Wedding Soup With Turkey Meatballs Recipe 2
2 pages
Assignment
No ratings yet
Assignment
2 pages
Expectation-Maximization Clustring V2
No ratings yet
Expectation-Maximization Clustring V2
9 pages
15 GMC
No ratings yet
15 GMC
4 pages
Verado L6 200-300 Gen5 & 350-400R Service Manual
86% (35)
Verado L6 200-300 Gen5 & 350-400R Service Manual
833 pages
07-Lifts and Fixhub
No ratings yet
07-Lifts and Fixhub
8 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
IGNITOR
No ratings yet
IGNITOR
1 page
Machine Learning & Data Mining: Understanding
No ratings yet
Machine Learning & Data Mining: Understanding
7 pages
Mixture Models and Clustering
No ratings yet
Mixture Models and Clustering
8 pages
All 03
No ratings yet
All 03
12 pages
Question Bank - Unit I
No ratings yet
Question Bank - Unit I
2 pages
An Evaluation of The Cross-Tension Test of Resistance Spot Welds in High-Strength Dual-Phase Steels
No ratings yet
An Evaluation of The Cross-Tension Test of Resistance Spot Welds in High-Strength Dual-Phase Steels
8 pages
Hair Care
No ratings yet
Hair Care
11 pages
Model-Based Clustering: o K o K
No ratings yet
Model-Based Clustering: o K o K
6 pages
Topic: Machine Learning
No ratings yet
Topic: Machine Learning
35 pages
Etc 974a Corel9
No ratings yet
Etc 974a Corel9
3 pages
Isolation of Caffeine Lab Report
100% (1)
Isolation of Caffeine Lab Report
5 pages
chương 1 Lưới điện thông minh
No ratings yet
chương 1 Lưới điện thông minh
13 pages
Linde - Page 2+3 PDF
No ratings yet
Linde - Page 2+3 PDF
13 pages
Knowledge K1 - Remembering K3 - Applying K5 - Evaluating Levels (KL) K2 - Understanding K4 - Analyzing K6 - Creating Course Outcome
No ratings yet
Knowledge K1 - Remembering K3 - Applying K5 - Evaluating Levels (KL) K2 - Understanding K4 - Analyzing K6 - Creating Course Outcome
3 pages
Mathematics for Data Science: Linear Algebra with Matlab
From Everand
Mathematics for Data Science: Linear Algebra with Matlab
César Pérez López
No ratings yet
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
From Everand
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
César Pérez López
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lecture 06

Uploaded by

Lecture 06

Uploaded by

Model-based clustering

Gaussian mixture models

Erik-Jan van Kesteren & Daniel L. Oberski

• Goal: understand, apply, and assess model-based

b. Assign examples to cluster

3. If assignments changed, back

All difficult to answer by staring at the algorithm.

The rule follows from the model!

Gaussian mixture models (GMMs)

• Maximum likelihood estimation: find the parameters 𝜃 for which

• NB: the model and its assumptions are debatable!

Or, in clearer notation:

Together, these 5 unknown parameters describe our model of how the

(and same for 𝜇ො2 and 𝜎ො2 )

But we don’t know this!

Does it remind you of something…?

NB: we’re distinguishing “components” from “clusters” now

ℓ 𝜃 = log 𝑝(𝑦|𝜃) = log ෑ 𝑝 𝑦𝑛 𝜃 = ෍ log 𝑝(𝑦𝑛 |𝜃)

Akaike information criterion

Bayesian information criterion

Better fit & lower complexity = better cluster solution

(other assessment methods also available for model-based

• Bouveyron et al. (2007) High-dimensional data clustering;

• How Mclust optimizes

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.