0% found this document useful (0 votes)
5 views51 pages

Lecture 06

Notes

Uploaded by

bhavanid07092002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views51 pages

Lecture 06

Notes

Uploaded by

bhavanid07092002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

Model-based clustering

Gaussian mixture models

Erik-Jan van Kesteren & Daniel L. Oberski


Last week
• Hierarchical clustering
• K-means clustering
• Assessing cluster solutions
• Stability
• Internal metrics
• External validation
Today
• Model-based clustering
• Maximum likelihood estimation
• EM algorithm
• Multivariate model-based clustering
• Assumptions & restrictions

• Goal: understand, apply, and assess model-based


clustering methods
Reading materials
• Mixture models: latent profile and
latent class analysis (Oberski, 2016)
http://daob.nl/wp-
content/papercite-
data/pdf/oberski2016mixturemode
ls.pdf
• MBCC sections 2.1 and 2.2
Model-based clustering
K-means again
1. Assign examples to 𝐾 clusters

2. a. Calculate K cluster
centroids;

b. Assign examples to cluster


with closest centroid;

3. If assignments changed, back


to step 2a; else stop.
K-means again
• K-means is based on a rule
• Why this rule and not some other rule?
• What kind of data does the rule work well for?
• In what situations would the rule fail?
• What happens if we want to change the rule?

All difficult to answer by staring at the algorithm.


K-means again
• k-means algorithm makes
clusters which are circular in
the space of the data.
• Is this reasonable?
• Maybe x and y covary within
the clusters, in the same way
or even differently?
• Maybe we need ellipses?
Model-based clustering
Steps:
1. Pretend we believe in some statistical model that describes
data as belonging to unobserved (“latent”) groups;
2. Estimate (“train”) this model using the data.

The rule follows from the model!


• Instead of worrying about algorithm, we worry about model.
• Earlier mentioned questions are easier to answer.
Model-based clustering
• Assumptions about the clusters are explicit, not implicit.
• We will look at the most used family of models:

Gaussian mixture models (GMMs)


• Data within each cluster (multivariate) normally distributed.
• Parameters can be either the same or different across groups:
• Volume (size of the clusters in data space);
• Shape (circle or ellipse);
• Orientation (the angle of the ellipse).
Model-based clustering
Another major advantage
• For each observation, get a posterior probability of
belonging to each cluster
• Reflects that cluster membership is uncertain
• Cluster assignment can be done based on the highest
probability cluster for each observation
Model-based clustering
Remember silhouette?
• 𝑎𝑖 = avg. distance to
fellow cluster
members (cohesion)
• 𝑏𝑖 = min. distance to
member from
different cluster
(separation)
𝑏𝑖 − 𝑎𝑖
𝑠𝑖 =
max 𝑎𝑖 , 𝑏𝑖 Introduction to
data mining
Model-based clustering
Specific examples of model-based clustering:
• Gaussian mixture models
• Latent profile analysis
• Latent class analysis (categorical observations)
• Latent Dirichlet allocation
Gaussian mixture modelling
Model-based clustering
• Statistical model + assumptions defines a likelihood:

𝑝 𝑑𝑎𝑡𝑎 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠) = 𝑝 𝑦 𝜃)

• Maximum likelihood estimation: find the parameters 𝜃 for which


it is most likely to observe this data
• This is how models can be estimated / fit / trained

• NB: the model and its assumptions are debatable!


Model-based clustering
Likelihood (density) for height data:

𝑝 ℎ𝑒𝑖𝑔ℎ𝑡 𝜃) =
𝑃𝑟 𝑚𝑎𝑛 𝑁𝑜𝑟𝑚𝑎𝑙 𝜇𝑚𝑎𝑛 , 𝜎𝑚𝑎𝑛 +
𝑃𝑟(𝑤𝑜𝑚𝑎𝑛)𝑁𝑜𝑟𝑚𝑎𝑙(𝜇𝑤𝑜𝑚𝑎𝑛 , 𝜎𝑤𝑜𝑚𝑎𝑛 )

Or, in clearer notation:

𝑝 ℎ𝑒𝑖𝑔ℎ𝑡 𝜃) =
𝜋1𝑋 𝑁𝑜𝑟𝑚𝑎𝑙 𝜇1 , 𝜎1 +
(1 − 𝜋1𝑋 )𝑁𝑜𝑟𝑚𝑎𝑙 𝜇2 , 𝜎2
Model-based clustering
Gaussian mixture parameters:
• 𝜋1𝑋 determines the relative cluster sizes
• Proportion of observations to be expected in each cluster
• 𝜇1 and 𝜇2 determine the locations of the clusters
• Like centroids in k-means clustering
• 𝜎1 and 𝜎2 determine the volume of the clusters
• how large / spread out the clusters are in data space

Together, these 5 unknown parameters describe our model of how the


data is generated.
Estimation: the EM algorithm
If we know who is a man and who is a woman, it’s easy to find
the maximum likelihood estimates for 𝜇 and 𝜎:
σ𝑁1
𝑖=1 ℎ𝑒𝑖𝑔ℎ𝑡𝑖
σ𝑁1
𝑖=1 ℎ𝑒𝑖𝑔ℎ𝑡𝑖 − 𝜇Ƹ 1
2
𝜇Ƹ 1 = , 𝜎ො1 =
𝑁1 𝑁1 − 1

(and same for 𝜇ො2 and 𝜎ො2 )

But we don’t know this!


-> Assignments need to be estimated too.
Estimation: the EM algorithm
• Solution: Figure out the posterior probability of being a
man/woman, given the current estimates of the means and
sds
• If we know cluster locations and shapes,
how likely is it that a 1.7m person is
a man or a woman?

𝑋 2.20
𝜋𝑚𝑎𝑛 = ≈ 0.77
2.86
Estimation: the EM algorithm
• Now we have some class assignments (probabilities);
• So we can go back to the parameters and update them using
our easy rule (M-step)
• Then, we can compute new posterior probabilities (E-step)

Does it remind you of something…?


Estimation: the EM algorithm
Live coding EM
Break
Multivariate model-based
clustering
Multivariate model-based
clustering
• With 2 observed features:
• mean becomes a vector of 2 means
• standard deviation turns into a 2x2 variance-covariance matrix
determining the shape of the cluster
• So we have multiple within-cluster parameters:
• Two means
• Two variances, one for each observed variable
• A single covariance among the features
• Together, the 11 parameters define the likelihood in bivariate
space, which from the top looks like ellipses
Multivariate normal distribution

𝑁𝑜𝑟𝑚𝑎𝑙 𝑥; 𝜇, 𝜎 =

𝑀𝑉𝑁 𝑥; 𝜇, 𝜎 =
Multivariate model-based
clustering
𝑝 𝒚 𝜃) = 𝜋1𝑋 𝑀𝑉𝑁 𝝁𝟏 , 𝚺𝟏 + 1 − 𝜋1𝑋 𝑀𝑉𝑁 𝝁𝟐 , 𝚺𝟐
Estimation: the EM algorithm
Multivariate model-based
clustering
• Cluster shape parameters (the variance-covariance matrix)
can be constrained to be equal across clusters
• Same as k-means
• Can also be different across clusters
• not possible in k-means
• More flexible, complex model
• Think about the bias-variance tradeoff!
TOP SECRET SLIDE
• K-means clustering is a GMM with the following model:
• All prior class proportions are 1/K
• EII model: equal volume, only circles
• All posterior probabilities are either 0 or 1
TOP SECRET SLIDE 2
• GMM has trouble with clusters that are not ellipses
• Secret weapon: merging

Powerful idea:
• Start with Gaussian mixture solution
• Merge “similar” components to create non-Gaussian clusters

NB: we’re distinguishing “components” from “clusters” now


Merging

library(mclust)
out <- Mclust(x)
com <-
clustCombi(out)

plot(com)
Assessing clustering results
Methods to assess whether the obtained clusters are “good”:
• Stability (previous lecture)
• External validity (previous lecture)
• Model fit
Model fit
How well does the model fit to the data?
Log-likelihood
𝑁 𝑁

ℓ 𝜃 = log 𝑝(𝑦|𝜃) = log ෑ 𝑝 𝑦𝑛 𝜃 = ෍ log 𝑝(𝑦𝑛 |𝜃)


𝑛=1 𝑛=1
The higher the log-likelihood, the more likely the data (if we
assume this model is correct)
Deviance
−2 ⋅ ℓ(𝜃) (lower deviance is better)
Information criteria
Deviance forms the basis of information criteria, which balance
fit and complexity

Akaike information criterion


𝐴𝐼𝐶 = −2ℓ 𝜃 + 2𝑘
(where k is the number of parameters)

Bayesian information criterion


𝐵𝐼𝐶 = −2ℓ 𝜃 + 𝑘 log 𝑛
(where n is the number of rows in your data)
Information criteria
Think: bias and variance tradeoff!
• Variance also has to do with stability

Better fit & lower complexity = better cluster solution

(other assessment methods also available for model-based


clustering)
High-dim!
How to do GMM in high dimensions?
• Same solution as we are used to by now!
• Perform clustering on dimension reduction version of original data
• Integrate regularization / dimension reduction in your GMM
optimization method

• Bouveyron et al. (2007) High-dimensional data clustering;


Computational Statistics & Data Analysis 52, 502 – 519
• The second solution
• Akin to “mixtures of probabilistic PCA”
Model-based clustering in R
Model-based clustering in R
• Mclust implements multivariate model-based clustering
• Provides an easy interface to fit several parameterizations
• Model comparison with BIC
• Plotting functionality
Model-based clustering in R
• Mclust uses an identifier for each possible parametrization of
the cluster shape: E for equal, V for variable in:
• Volume (size of the clusters in data space)
• Shape (circle or ellipse)
• Orientation (the angle of the ellipse)
• So an EEE model has equal volume, shape and orientation
• A VVV model has variable volume, shape, and orientation
• A VVE model has variable volume and shape but equal
orientation
Model-based clustering in R
Model-based clustering in R
VVV, 3 clusters

• How Mclust optimizes


hyperparameters:
• Fit all the models with up to
9 clusters (or more, your
choice!)
• Compute the BIC of each
model
• Choose the model with the
lowest BIC
Practical: perform model-based
clustering
Take-home exercises: 1-11
Questions?

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy