0% found this document useful (0 votes)
2 views84 pages

nnml_ml3

The document provides an overview of Principal Component Analysis (PCA) and Hierarchical Clustering, focusing on statistical characteristics of data such as mean, variance, and standard deviation. It discusses the importance of centering and scaling data for effective analysis, as well as the visualization of numeric and multivariate data. Additionally, it explains covariance, correlation, and the role of eigenvalues and eigenvectors in PCA for dimensionality reduction while preserving variance.

Uploaded by

fawadalikhan918
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views84 pages

nnml_ml3

The document provides an overview of Principal Component Analysis (PCA) and Hierarchical Clustering, focusing on statistical characteristics of data such as mean, variance, and standard deviation. It discusses the importance of centering and scaling data for effective analysis, as well as the visualization of numeric and multivariate data. Additionally, it explains covariance, correlation, and the role of eigenvalues and eigenvectors in PCA for dimensionality reduction while preserving variance.

Uploaded by

fawadalikhan918
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 84

Machine Learning: Principal Component

Analysis, Hierarchical Clustering


Preview Version

Jan T Kim

Computer Science, SPECS, UH

Semester B 2023/24, T-3

Jan T Kim PCA & Hierarchical Clustering


Statistical Characteristics of Data

◮ Mean (average, centre of mass):

1 n
n i∑
x= xi
=1

☞ minimises squared error.


◮ Variance:
n
1
E((x − x )2 ) = σx2 = ( xi − x ) 2
n − 1 i∑
=1

Notation: σxx := σx2


◮ Standard deviation:
q
σx = σx2

Jan T Kim PCA & Hierarchical Clustering


“Centering” of Vectors

◮ The “centred version” of a vector x is


 
x1 − x
..
xc = 
 
. 
xn − x

◮ The mean of a “centred” vector is xc = 0


◮ Centering should be applied to sets (or vectors) containing
elements of the same dimension.
◮ Typically applied to columns (features) of data matrices.
◮ Normally not applicable to rows.

Jan T Kim PCA & Hierarchical Clustering


Scaling to Unit Variance / Standard Deviation

◮ Scaling to unit variance, or unit standard deviation, is


achieved by

1
xs = x
σx
◮ Should be applied to elements of same dimension.
◮ Typically applied to columns (features) of data matrices.
◮ Normally not applicable to rows.

Jan T Kim PCA & Hierarchical Clustering


Centering and Scaling: Demo

original data
4
2
0
y

−2
−4

−4 −2 0 2 4
x

Jan T Kim PCA & Hierarchical Clustering


Centering and Scaling: Demo

original data centred


4

4
2

2
0

0
y

y
−2

−2
−4

−4

−4 −2 0 2 4 −4 −2 0 2 4
x x

Jan T Kim PCA & Hierarchical Clustering


Centering and Scaling: Demo

original data centred centred and scaled


4

4
2

2
0

0
y

y
−2

−2

−2
−4

−4

−4
−4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4
x x x

Jan T Kim PCA & Hierarchical Clustering


Centering and Scaling: Demo

original data centred and scaled

2
1.5

1
0
y

−1
0.5

−3
−0.5

−3 −1 0 1 2 −2 0 1 2 3
x x

Jan T Kim PCA & Hierarchical Clustering


Centering and Scaling: Demo

original data centred and scaled


4

4
2

2
0

0
y

y
−2

−2
−4

−4
−4 −2 0 2 4 −4 −2 0 2 4
x x

☞ Choose axis ranges to visualise relevant features.

Jan T Kim PCA & Hierarchical Clustering


Visualising Numeric Data

x1 x2 x3 x4
1 1 0.41 0.00 1.00
2 2 0.86 0.95 0.31
3 3 1.10 0.59 -0.81
4 4 1.55 -0.59 -0.81
5 5 1.90 -0.95 0.31

Jan T Kim PCA & Hierarchical Clustering


Visualising Numeric Data

x1 x2 x3 x4
1 1 0.41 0.00 1.00 plot(dataFrame);
2 2 0.86 0.95 0.31 0.5 1.0 1.5 −0.5 0.0 0.5 1.0

5
3 3 1.10 0.59 -0.81

4
x1

3
4 4 1.55 -0.59 -0.81

2
1
5 5 1.90 -0.95 0.31

1.5
x2

1.0
0.5

1.0
0.5
x3

0.0
−1.0
1.0
0.5
x4
−0.5 0.0

1 2 3 4 5 −1.0 −0.5 0.0 0.5 1.0

Jan T Kim PCA & Hierarchical Clustering


Visualising Multivariate Data: Planar Projections

3
2
1
0
y

−1
−2
−3

−0.6 −0.4 −0.2 0.0 0.2 0.4 0.6

Jan T Kim PCA & Hierarchical Clustering


Visualising Multivariate Data: Planar Projections

2
1
0
z

−1
−2

−0.6 −0.4 −0.2 0.0 0.2 0.4 0.6

Jan T Kim PCA & Hierarchical Clustering


Visualising Multivariate Data: Planar Projections

−3 −2 −1 0 1 2 3

0.6
0.4
0.2
x

0.0
−0.2
−0.4
−0.6
3
2
1

y
0
−1
−2
−3

2
1
0
z

−1
−2
−0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 −2 −1 0 1 2

Jan T Kim PCA & Hierarchical Clustering


Visualising Multivariate Data: Planar Projections

−3 −2 −1 0 1 2 3

y
3
z

2
1
0
−1
−2
−3
−4
−0.8
−0.6
−0.4
−0.20.00.20.40.60.8
x

Jan T Kim PCA & Hierarchical Clustering


Visualising Multivariate Data: Planar Projections

3
2
1
−3 −2 −1 0
z

y
2
1
0
−1
−2
−3
−3 −2 −1 0 1 2 3
x

Jan T Kim PCA & Hierarchical Clustering


Covariance and Correlation

◮ Covariance:
n
1
n − 1 i∑
cov(x , y ) = σxy = (xi − x )(yi − y )
=1

☞ Invariant to component order:


cov((x1 , x2 ), (y1 , y2 )) = cov((x2 , x1 ), (y2 , y1 ))

◮ Pearson correlation coefficient:


cov(x , y ) ∑n (xi − x )(yi − y )
rxy = = p n i =1 p
σx σy ∑i =1 (xi − x )2 ∑ni=1 (yi − y )2
xc · yc
= = cos ∠(xc , yc )
kxc kkyc k

Jan T Kim PCA & Hierarchical Clustering


Correlation Coefficient: Properties

◮ rxy = 1 only if x = ay with a 6= 0.


◮ rxy = 0 is necessary but not sufficient to establish
independence.
◮ independent ⇒ uncorrelated, but
◮ uncorrelated 6⇒ independent.
1 0.8 0.4 0 -0.4 -0.8 -1

1 1 1 -1 -1 -1

0 0 0 0 0 0 0

https://en.wikipedia.org/wiki/File:Correlation_examples2.svg

Jan T Kim PCA & Hierarchical Clustering


Principal Component Analysis

◮ Explore structure of covariances among features.


◮ Identify direction(s) of largest variance(s).
◮ Reduce dimension of dataset, preserving as much variance
as possible.
◮ Preserve “most important dimensions”.
◮ Remove “noise”.
◮ Preparation for further analysis.

Jan T Kim PCA & Hierarchical Clustering


A Matrix of Points in R2 . . .

 
−1 −1

3
 −0.5 −1 

2
 
 0 − 1 

1
 
 .. .. 

0
y
 . . 
P= 

−1
 0.5 0 
 
 .. .. 

−2
 . . 

−3
 
 0.5 1  −3 −2 −1 0 1 2 3

1 1 x

Jan T Kim PCA & Hierarchical Clustering


. . . Multiplied by a 2 × 2 Matrix S . . .

 

3
3 0
S=
0 0.8

2
1
Compute and plot

0
y

−1
.. ..
 
. .

−2
 
1.5 0

−3
 
PS = 
 .. ..

 −3 −2 −1 0 1 2 3
 . .  x

3 0.8

◮ Points in P have been scaled.


◮ by a factor of 3 in x–direction
◮ by a factor of 0.8 in y –direction

Jan T Kim PCA & Hierarchical Clustering


Eigenvalues and Eigenvectors

e = (0.5, 0) is mapped to p = (1, 1) is mapped to


Se = (1.5, 0) Sp = (3, 0.8)
◮ There is a λ such that
◮ There is no λ such that
Se = λ e: λ = 3
Sp = λ p.
◮ e is an eigenvector with
eigenvalue λ = 3 3
2
1
0 ◮ p is not an eigenvector
y

−1
−2
−3

−3 −2 −1 0 1 2 3
Jan T Kim x PCA & Hierarchical Clustering
Eigenvalues and Eigenvectors

e = (0.5, 0) is mapped to p = (1, 1) is mapped to


Se = (1.5, 0) Sp = (3, 0.8)
◮ There is a λ such that
◮ There is no λ such that
Se = λ e: λ = 3
Sp = λ p.
◮ e is an eigenvector with
eigenvalue λ = 3 ◮ p is not an eigenvector
1
0
y

−1
−2

Jan T Kim PCA & Hierarchical Clustering


Eigenvalues and Eigenvectors

e = (0.5, 0) is mapped to p = (1, 1) is mapped to


Se = (1.5, 0) Sp = (3, 0.8)
◮ There is a λ such that
◮ There is no λ such that
Se = λ e: λ = 3
Sp = λ p.
◮ e is an eigenvector with
eigenvalue λ = 3 ◮ p is not an eigenvector
1
0
y

−1
−2

Jan T Kim PCA & Hierarchical Clustering


Eigenvalues and Eigenvectors

e = (0.5, 0) is mapped to p = (1, 1) is mapped to


Se = (1.5, 0) Sp = (3, 0.8)
◮ There is a λ such that
◮ There is no λ such that
Se = λ e: λ = 3
Sp = λ p.
◮ e is an eigenvector with
eigenvalue λ = 3 ◮ p is not an eigenvector
1
0
y

−1
−2

Jan T Kim PCA & Hierarchical Clustering


Rotation: Another Matrix . . .

 

3
0.87 0.5
R=
−0.5 0.87

2
1
Compute and plot

0
y

−1
.. ..
 
. .

−2
 
1.3 0.7

−3
 
PSR = 
 .. ..

 −3 −2 −1 0 1 2 3
 . .  x

2.1 2.4

◮ Points in P have been


◮ scaled by matrix S,
◮ then rotated by π6 by matrix R.

Jan T Kim PCA & Hierarchical Clustering


Rotation: Another Matrix . . .

cos( π6 ) sin( π6 )
 

3
R=
− sin( π6 ) cos( π6 )

2
1
Compute and plot

0
y

−1
.. ..
 
. .

−2
 
1.3 0.7

−3
 
PSR = 
 .. ..

 −3 −2 −1 0 1 2 3
 . .  x

2.1 2.4

◮ Points in P have been


◮ scaled by matrix S,
◮ then rotated by π6 by matrix R.

Jan T Kim PCA & Hierarchical Clustering


PCA Computation: Covariance Matrix

Compute the covariance matrix


 
cov(x∗,1 , x∗,1 ) cov(x∗,1 , x∗,2 ) · · · cov(x∗,1 , x∗,d )
 cov(x∗,2 , x∗,1 ) cov(x∗,2 , x∗,2 ) · · · cov(x∗,2 , x∗,d ) 
Σ=
 
.. .. .. .. 
 . . . . 
cov(x∗,d , x∗,1 ) cov(x∗,d , x∗,2 ) · · · cov(x∗,d , x∗,d )

◮ x∗,i denotes the i-th column of the dataset X .


◮ The covariance matrix is symmetric (cov(x , y ) = cov(y , x )).
◮ Its eigenvectors are orthogonal.

Jan T Kim PCA & Hierarchical Clustering


Eigenvectors of Symmetric Matrices are Orthogonal

Symmetric matrix: A = AT
◮ Let λi , λj be eigenvalues, λi 6= λj ,
◮ ei , ej corresponding eigenvectors.

λi (ei · ej )
= (Aei ) · ej
= (eiT AT ) · ej
= (eiT A)ej
= eiT (Aej )
= eiT (λj ej )
= λj (ei · ej )

So, (λi − λj )(ei · ej ) = 0 ⇒ ei · ej = 0 and ei and ej are orthogonal.

Jan T Kim PCA & Hierarchical Clustering


PCA Computation: Eigenvectors and Eigenvalues

◮ The d principal components (PCs) of a n × d data matrix


are the eigenvectors if its covariance matrix u1 , . . . , ud .
◮ components of eigenvectors are known as loadings.
◮ PCs are linear combinations of the original dimensions.
◮ PCs have unit length: kui k = 1.
◮ PCs are pairwise orthogonal: ui uj = 0∀i 6= j.
◮ The corresponding eigenvalues λi are equal to the variance
along the corresponding PC.
◮ PCs are ordered by decreasing eigenvalues:

λ1 ≥ λ2 ≥ . . . ≥ λd

Jan T Kim PCA & Hierarchical Clustering


PCA: Variance Conservation in PCs

◮ Total variance in original data is σ12 + σ22 + . . . + σd2 = trace(Σ


Σ)
◮ The trace of a square matrix A is defined to be the sum of the
elements on the main diagonal: trace(A) = ∑ aii .
◮ Total variance of PCs is ∑ λi
◮ According to property of eigenvalues of Σ

∑ λi = trace(ΣΣ)
◮ Covariances:
◮ generally nonzero in original data: cov(x∗,i , x∗,j ) 6= 0
◮ principal components are decorrelated:
cov(p∗,i , p∗,j ) = 0∀i 6= j

Jan T Kim PCA & Hierarchical Clustering


Maximising Variance Preservation

◮ The proportion of variance in PCs 1, . . . , k is

∑ki=1 λi
.
∑di=1 λi
◮ Maximum variance that can be preserved in k < d
dimensions.

Jan T Kim PCA & Hierarchical Clustering


Example PCA Workflow: Dimension Reduction

◮ Pre-processing: Consider
◮ centering columns,
◮ scaling columns to unit variance.
Built into many PCA implementations, check documentation.
◮ Calculate the covariance matrix.
◮ Compute the eigenvectors and eigenvalues of the covariance
matrix.
◮ Select k principal components
◮ Reduce dimensions:
◮ Transform data to PC space.
◮ Project to selected PCs.
◮ Transform back to original space.

Jan T Kim PCA & Hierarchical Clustering


PCA of Lattice Example

Data: X = PSR
3
2
1
0
y

−1
−2
−3

−3 −2 −1 0 1 2 3

Jan T Kim PCA & Hierarchical Clustering


PCA of Lattice Example

PCA results:
◮ Eigenvalues: λ1 = 4.7, λ2 = 0.3, scaling matrix
! 
√1 0

λ1 0.46 0
L= =
0 √1 0 1.73
λ 2

   
0.87 −0.5
◮ Eigenvectors: u1 = , u2 = , combined to
0.5 0.87
matrix
 
0.87 −0.5
U=
0.5 0.87

Jan T Kim PCA & Hierarchical Clustering


PCA of Lattice Example

3
2

2
1

1
×U
0

0
y

y
=
−1

−1
−2

−2
−3

−3
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3

x x
3

3
2

2
1

1
×L
0

0
y

y
=
−1

−1
−2

−2
−3

−3

−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3

x x

Jan T Kim PCA & Hierarchical Clustering


PCA of Lattice Example

◮ Recall that X = PSR


◮ The eigenvectors “undo” the rotation:

U = R −1

◮ The square roots of eigenvalues “undo” the scaling:

L = const · S −1

Jan T Kim PCA & Hierarchical Clustering


PCA of Lattice Example: Remove PC2

 √   
′ λ1 0 −1 0.87 −0.5
S = , R=U =
0 0 0.5 0.87
3
2
1

XUL =
0
y

−1
−2
−3

−3 −2 −1 0 1 2 3

Jan T Kim PCA & Hierarchical Clustering


PCA of Lattice Example: Remove PC2

 √   
′ λ1 0 −1 0.87 −0.5
S = , R=U =
0 0 0.5 0.87
3
2
1

XULS ′ =
0
y

−1
−2
−3

−3 −2 −1 0 1 2 3

Jan T Kim PCA & Hierarchical Clustering


PCA of Lattice Example: Remove PC2

 √   
′ λ1 0 −1 0.87 −0.5
S = , R=U =
0 0 0.5 0.87

3
2
1

XULS ′ U −1 =
0
y

−1
−2
−3

−3 −2 −1 0 1 2 3

Jan T Kim PCA & Hierarchical Clustering


PCA: Demo 1

raw data
−5 −3 −1 1 −6 −4 −2 0 2

0.3
0.1
x1

−0.1
−0.3
1
−1

x2
−3
−5

0.0
x3

−1.0
−2.0
2
0
−2

x4
−4
−6

2
0
x5

−2
−4
−0.3 −0.1 0.1 0.3 −2.0 −1.0 0.0 −4 −2 0 2

Jan T Kim PCA & Hierarchical Clustering


PCA: Demo 1

raw data

variances per original feature


5
4
3
2
1
0

x1 x2 x3 x4 x5

Jan T Kim PCA & Hierarchical Clustering


PCA: Demo 1

PCA, without scaling to unit variance


−4 −2 0 2 4 −0.3 −0.1 0.1

4
2
PC1

0
−4 −2
4
2

PC2
0
−2
−4

0.1
PC3

−0.1
−0.3
0.1

PC4
−0.1
−0.3

0.0 0.1 0.2


PC5

−0.2
−4 −2 0 2 4 −0.3 −0.1 0.1 −0.2 0.0 0.1 0.2

Jan T Kim PCA & Hierarchical Clustering


PCA: Demo 1

PCA, without scaling to unit variance


−20 −10 0 10 20

0.15
x5 135
137105
111

20
134 112 102
131
109
138 141 139
148
124
120
0.10 123127
128 145
129
104
101 132121
114
147 146
115
125
140 106
144
103
107116 149 119

10
944 136 117
143
113122
130 150
126
118
0.05

x2 12
4624 18 1531
111 110 108
142
26
49
2223
39 35 34
30
5013
2125 8
27 133
17 3 33x3
14
20 x1
0.00

16 28

0
48
PC2

47 45 36
32
2738641 37
4 42
29 89 52
5 4010
−0.05

1943 65 6095
7670 77

−10
86 84 9363
7188 67
92 97 62
94
66 545564
69
−0.10

58 9098 5773
8153
56
x4 74
7991 6185 72 83

−20
59 82100
75
78 68
−0.15

96
51 80
8799

−0.15 −0.10 −0.05 0.00 0.05 0.10 0.15

PC1

Jan T Kim PCA & Hierarchical Clustering


PCA: Demo 1

PCA, without scaling to unit variance

7
6
5
4
3
2
1
0 variances per PC without scaling

PC1 PC2 PC3 PC4 PC5

Jan T Kim PCA & Hierarchical Clustering


PCA: Demo 1

PCA, without scaling to unit variance


−3 −1 0 1 2 3 −0.2 0.0 0.2

3
2
1
PC1

−1 0
−3
3
2
1

PC2
−1 0
−3

1.5
0.5
PC3

−0.5
−1.5
0.2

PC4
0.0
−0.2

0.05
PC5

−0.05
−0.15
−3 −1 0 1 2 3 −1.5 −0.5 0.5 1.5 −0.15 −0.05 0.05

Jan T Kim PCA & Hierarchical Clustering


PCA: Demo 1

PCA, without scaling to unit variance

variances per PC with scaling


2.5
2.0
1.5
1.0
0.5
0.0

PC1 PC2 PC3 PC4 PC5

Jan T Kim PCA & Hierarchical Clustering


PCA: Demo 2

−20 −10 0 10 20 −20 −10 0 10 20

20
10
x1

0
−20 −10
20
10

x2
0
−20 −10

20
10
x3

0
−20 −10
20
10

x4
0
−20 −10

−20 −10 0 10 20 −20 −10 0 10 20

Jan T Kim PCA & Hierarchical Clustering


PCA: Demo 2

−20 −10 0 10 20 −20 −10 0 10 20

20
10
PC1

0
−20 −10
20
10

PC2
0
−20 −10

20
10
PC3

0
−20 −10
20
10

PC4
0
−20 −10

−20 −10 0 10 20 −20 −10 0 10 20

Jan T Kim PCA & Hierarchical Clustering


PCA: Demo 2

20
10
PC2

0
−10
−20

−20 −10 0 10 20

PC1

Jan T Kim PCA & Hierarchical Clustering


Principal Component Analysis (PCA): Summary

◮ PCs form a basis of the d-dimensional data space.


◮ PCs are a set of orthogonal unit vectors.
◮ Linear transformation from original basis into PC basis.
◮ Invertible operation: Data can be transformed back to original
dimensions.
◮ PCs are ordered by decreasing variance.
◮ Choosing top PCs achieves maximal preservation of
variance in k < d dimensions.
◮ Applications include
◮ dimension reduction,
◮ removal of “noise”,
◮ preparation for further analysis.

Jan T Kim PCA & Hierarchical Clustering


Distances

◮ A function d (x , y ) is called a distance metric if it is


◮ non-negative: d (x , y ) ≥ 0,
◮ 0 only for identical values: d (x , y ) = 0 ⇔ x = y
◮ pseudometric: For some x 6= y, d (x , y ) = 0
◮ symmetric: d (x , y ) = d (y , x ),
◮ satisfying the triangular inequality d (x , y ) ≤ d (x , z ) + d (z , y )
◮ Distance metrics quantify dissimilarity.
◮ Larger values indicate less similarity.
☞ Not all dissimilarities / divergences are distances.

Jan T Kim PCA & Hierarchical Clustering


Distances: Examples

q √
◮ L2 norm (Euclidean norm): kx k2 = ∑i xi2 = 16 + 9 = 5
◮ L1 norm (Manhattan norm): kx k1 = ∑i |xi | = 4 + 3 = 7
◮ L∞ norm: kx k∞ = maxi |xi | = max{4, 3} = 4

Jan T Kim PCA & Hierarchical Clustering


Correlation-Based Dissimilarity

Pearson correlation coefficient:


cov(x , y ) ∑n (xi − x )(yi − y ) xc yc
rxy = = p n i =1 p n =
σx σy 2
∑i =1 (xi − x ) ∑i =1 (yi − y ) 2 kxc kkyc k

◮ Correlation is a similarity measure.


◮ Larger values indicate more similarity.
◮ Dissimilarity can be quantified by
◮ 1 − rxy
2
◮ 1 − rxy

Jan T Kim PCA & Hierarchical Clustering


Visualising Numeric Data

x1 x2 x3 x4
1 1 0.41 0.00 1.00
2 2 0.86 0.95 0.31
3 3 1.10 0.59 -0.81
4 4 1.55 -0.59 -0.81
5 5 1.90 -0.95 0.31

Jan T Kim PCA & Hierarchical Clustering


Visualising Numeric Data

x1 x2 x3 x4
1 1 0.41 0.00 1.00 plot(dataFrame);
2 2 0.86 0.95 0.31 0.5 1.0 1.5 −0.5 0.0 0.5 1.0

5
3 3 1.10 0.59 -0.81

4
x1

3
4 4 1.55 -0.59 -0.81

2
1
5 5 1.90 -0.95 0.31

1.5
x2

1.0
0.5

1.0
0.5
x3

0.0
−1.0
1.0
0.5
x4
−0.5 0.0

1 2 3 4 5 −1.0 −0.5 0.0 0.5 1.0

Jan T Kim PCA & Hierarchical Clustering


Visualising Numeric Data

x1 x2 x3 x4
1 1 0.41 0.00 1.00 plot(dataFrame);
2 2 0.86 0.95 0.31 0.5 1.0 1.5 −0.5 0.0 0.5 1.0

5
3 3 1.10 0.59 -0.81

4
x1

3
4 4 1.55 -0.59 -0.81

2
1
5 5 1.90 -0.95 0.31

1.5
x2

1.0
0.5

1.0
image(matrix2D);

0.5
x3

0.0
−1.0
1.0
0.8

0.5
x4
−0.5 0.0
0.4

1 2 3 4 5 −1.0 −0.5 0.0 0.5 1.0


0.0

0.0 0.2 0.4 0.6 0.8 1.0

Jan T Kim PCA & Hierarchical Clustering


Hierarchical Clustering

◮ General idea: Find hierarchical structures in the data.


◮ Algorithmic idea:
◮ Consider a distance matrix.
◮ Iteratively group closest clusters.

Jan T Kim PCA & Hierarchical Clustering


Hierarchical Clustering: Algorithm

◮ Input: a n × n distance matrix (d (x , y )).


◮ Place each data item in a one-member cluster.
◮ Let the cluster distance between one-item clusters be the
distance between the items: dc (i , j ) = dij
◮ Repeat
◮ Find clusters i , j with smallest distance.
◮ Merge clusters i , j to new cluster (ij ).
◮ Let (dc ((ij ), k ) = min{dc (i , k ), dc (j , k )}∀k (Single linkage)
◮ Remove rows and columns i and j.
◮ until a cluster contains all data.

Jan T Kim PCA & Hierarchical Clustering


Worked Example: Single Linkage

a b c d e
b
a 0 1 3 3 5
b 1 0 3 3 5
c
c 3 3 0 2 5
d 3 3 2 0 4
e 5 5 5 4 0 d

Jan T Kim PCA & Hierarchical Clustering


Worked Example: Single Linkage

a b c d e
b
a 0 1 3 3 5
b 1 0 3 3 5
c
c 3 3 0 2 5
d 3 3 2 0 4
e 5 5 5 4 0 d

Jan T Kim PCA & Hierarchical Clustering


Worked Example: Single Linkage

ab c d e b
ab 0 3 3 5
c 3 0 2 5 c
d 3 2 0 4
e 5 5 4 0 d

Jan T Kim PCA & Hierarchical Clustering


Worked Example: Single Linkage

ab c d e b
ab 0 3 3 5
c 3 0 2 5 c
d 3 2 0 4
e 5 5 4 0 d

Jan T Kim PCA & Hierarchical Clustering


Worked Example: Single Linkage

b
ab cd e
ab 0 3 5
c
cd 3 0 4
e 5 4 0
d

Jan T Kim PCA & Hierarchical Clustering


Worked Example: Single Linkage

b
ab cd e
ab 0 3 5
c
cd 3 0 4
e 5 4 0
d

Jan T Kim PCA & Hierarchical Clustering


Worked Example: Single Linkage

b
abcd e
abcd 0 4 c
e 4 0
d

Jan T Kim PCA & Hierarchical Clustering


Worked Example: Single Linkage

b
abcd e
abcd 0 4 c
e 4 0
d

Jan T Kim PCA & Hierarchical Clustering


Worked Example: Single Linkage

b
abcde
c
abcde 0

Jan T Kim PCA & Hierarchical Clustering


Clustering Criteria

◮ Single linkage: dc ((ij ), k ) = min{dc (i , k ), dc (j , k )}


◮ Complete linkage: (dc ((ij ), k ) = max{dc (i , k ), dc (j , k )}
◮ Average linkage:
dc ((ij ), k ) = mean{d (x , y ) : x ∈ clusteri , y ∈ clusterj }
◮ Ward’s criterion: minimise sum of squared distances within
clusters.
◮ Minimises within-cluster variance.
◮ Not guaranteed to be optimal.
◮ Several other criteria, see e.g.
https://en.wikipedia.org/wiki/Hierarchical_clustering

Jan T Kim PCA & Hierarchical Clustering


Hierarchical Clustering: Example

2
1
0
y

−1
−2

−2 −1 0 1 2

Jan T Kim PCA & Hierarchical Clustering


Height

0 1 2 3

35
54
14
12
13
75
70
74
43
50
1040
601
72
49
48
57
36
6747
29
55
53
7338
7
22 24

Jan T Kim
39
4
6466
69
5
37
25
61
30
45 11

dist(d)
422
46
19
23
59
Hierarchical Clustering: Example

71 31
63 21

hclust (*, "complete")


20
56
27 28
Cluster Dendrogram

52
15
658
41
58
683
17
446
32
33
349
62
51
26
16
18
PCA & Hierarchical Clustering
Hierarchical Clustering: Example

1.0
0.8
0.6
0.4
0.2
0.0

−0.5 0.0 0.5 1.0 1.5

Jan T Kim PCA & Hierarchical Clustering


Hierarchical Clustering: Example

53
55
47
67
57
1
60
40
70
43
12
14
54
68
8
15
33
6
44
9
51
18
21
52
56
63
11
2
71
19
38
22
39
4
66
5
61
30
x

Jan T Kim PCA & Hierarchical Clustering


Hierarchical Clustering: Example

Distance matrix, unordered

70
60
50
40
y

30
20
10

10 20 30 40 50 60 70

Jan T Kim PCA & Hierarchical Clustering


Hierarchical Clustering: Example

Distance matrix, ordered by clustering

70
60
50
40
y

30
20
10

10 20 30 40 50 60 70

Jan T Kim PCA & Hierarchical Clustering


Complete, Average and Single Linkage

input order

1, ..., 300
301, ..., 600

10
8
6

1.5

y
z

1.0
4

0.5

0.0
2

−0.5

−1.0

−1.5
0

−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

Jan T Kim PCA & Hierarchical Clustering


Height

0 2 4 6 8 10

301
306
304
303
302
305
308
307
309
311
312
310
313
315
314
318
317
316
319
320
321
322
323
327
326
325
324
13
142
3
4
12
15
5
6
1071
8
9
11
23
22
21
18
20
19
17
16
36
35
31
32
33
34
27
26
30
29
28
24
25
329
328
330
331
332
335
334
333
348
349
346
344
347
345
336
339
337
338
343
342
340
341
37
38
39
40
43
41
42
46
47
44
45
48
52
51
49
50
56
54
55
53
58
57
61
60
59
69
70
68
71
64
63
62
65
66
67
102
101
100
103
104
105
99
98
97
95
96
92
94
91
93
79
78
77
76
75
74
72
73
80
81
84
83
82
90
88
89
86
85
87
360
359
363
362
361
366
364
365
355
357
356
358
353
352
354
351
350
367
370
369
368
375
374
373
372
371
379
381
380
378
377
376
387
385
386
383
382
384
530
529
528
527
526
525
537
538
539
540
535
536
531
532
534
533
514
513
510
511
512
505
504
509
508
507
506
516
517
515
523
524
521
522
519
520
518
251
252
250
247
246
248
249
245
243

Jan T Kim
244
241
242
240
238
239
256
255
253
254
259
260
258
257
270
271
272
268
269
263
262
261
266
267
265
264
599
600
596
598
597
586
589
588
587
594
595
591
590
593
592
580
581
582
583
585
584
573
574
575
579
576
577
578
572
570
571
566
567
569
568
559
560
556
557
558
563
561
562
565
564
547
546
545
544
543
542
541
554
555
553
551
550
549
552
548
300
299
298
297
296
294
295
291
292
293
274
273
275
280
281
278
277
279
276
289
290
288
287
282
283
284
286
285
109

dist(dHelix)
110
106
107
108
115
114
113
111
112
127
125
124
126
128
129
118
119
116
117
121
120
122
123
397
395
396
398
399
400
394
393
389
388
390
391
392
409
410
411
413
412
414
415
406
408
407
401
403
402
404
405
154
152
153

hclust (*, "complete")


149
150
148
151
147
146
144
145
139
140
141
143
142
138
complete linkage

137
136
135
130
132
131
133
134
439
440
442
441
446
444
445
443
438
434
437
436
432
435
433
428
427
426
431
430
429
425
424
423
422
421
420
419
416
418
417
178
180
179
181
176
177
174
175
173
156
155
157
158
163
161
162
159
160
165
164
167
166
172
171
169
168
170
478
476
475
477
468
465
466
467
469
470
471
472
473
474
459
460
461
462
463
464
454
453
455
458
457
456
448
447
449
452
450
451
205
203
204
202
201
208
206
207
213
210
209
212
211
193
192
194
195
196
197
198
200
199
183
185
184
182
186
188
187
189
191
190
236
237
233
235
234
225
227
228
230
229
232
231
214
215
216
217
218
221
220
219
223
222
224
226
489
490
Complete, Average and Single Linkage

487
488
486
485
483
484
482
479
481
480
501
499
498
502
503
500
497
495
496
492
491
494
493

PCA & Hierarchical Clustering


Complete, Average and Single Linkage

ordered by complete linkage

1, ..., 300
301, ..., 600

10
8
6

1.5

y
z

1.0
4

0.5

0.0
2

−0.5

−1.0

−1.5
0

−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

Jan T Kim PCA & Hierarchical Clustering


Height

0 1 2 3 4 5

398
399
400
397
395
396
394
393
383
382
384
379
381
380
387
385
386
389
388
390
391
392
409
410
411
413
412
414
415
406
408
407
401
405
404
403
402
416
418
417
419
422
421
420
425
424
423
428
427
426
431
430
429
92
94
91
93
102
101
100
99
98
97
95
96
104
105
103
109
110
106
107
108
118
119
116
117
115
114
113
111
112
128
129
130
132
131
133
134
121
120
122
123
126
127
125
124
139
140
138
137
136
135
144
141
143
142
151
149
150
148
145
147
146
31
32
33
34
37
36
35
27
26
30
29
23
22
28
24
25
10
21
18
20
193 5
671
428
17
16
12
15
13
14
11
46
47
44
45
43
41
42
40
38
39
52
51
49
50
60
59
58 498
57
56
54
55
53
69
70
68
71
61
64
63
62
72
7365
66
67
79
78
77
76
75
74
90
88
89
86
85
87
84
83
82
80
81
305
308
307
309
311
312
310
301
306
304
303
302
332
335
334
333
330
331
329
328

Jan T Kim
327
326
318
317
316
313
315
314
322
323
325
324
321
319
320
348
349
347
345
346
344
336
339
337
338
343
342
340
341
378
375
374
377
376
367
373
372
371
370
369
368
360
359
363
362
361
366
364
365
355
357
356
358
350
351
353
352
354
259
260
258
257
266
267
265
264
263
262
261
248
249
247
246
245
243
244
251
252
250
256
255
253
254
270
271
272
268
269
274
273
275
280
281
278
277
279
276
282
285
283
284
289
290
286
288
287
294
295
291
292
293
300
299
298
297
296
531
532
534
533
538
535
536
537
539
540
544
543
542

dist(dHelix)
541
547
546
545
548
551
550
549
554
552
555
553
556
557
558
561
559
560
573
574
575
579
576
577
578
563
562
565
564
566
567
572
568
569
570
571
586
589
588
587
585
584
580
581
582
583
599

hclust (*, "average")


600
596
598
597
594
595
591
590
593
592
526
average linkage

525
530
529
528
527
517
515
514
513
523
524
516
518
519
520
521
522
509
510
511
512
504
508
505
507
506
497
495
496
492
491
494
493
501
499
498
502
503
500
208
206
207
201
202
205
203
204
221
220
219
216
217
218
214
215
212
211
213
210
209
241
242
240
238
239
236
237
233
235
234
223
222
224
226
230
232
231
225
229
227
228
156
155
157
158
154
152
153
172
171
169
168
170
165
164
167
166
163
161
162
159
160
196
197
198
200
199
193
192
194
195
189
191
190
182
183
185
184
186
188
187
178
180
179
181
176
177
174
175
173
471
472
473
474
478
476
475
477
489
490
487
488
486
485
483
484
482
479
481
480
434
437
Complete, Average and Single Linkage

436
432
435
433
438
439
440
442
441
447
449
445
443
446
444
452
448
450
451
454
453
455
458
457
456
459
460
461
462
463
464
465
466
468
467
469
470

PCA & Hierarchical Clustering


Complete, Average and Single Linkage

ordered by average linkage

1, ..., 300
301, ..., 600

10
8
6

1.5

y
z

1.0
4

0.5

0.0
2

−0.5

−1.0

−1.5
0

−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

Jan T Kim PCA & Hierarchical Clustering


Height

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4

589
586
588
587
594
595
596
599
600
598
597591
590
593
592
547
546
545
561
559
560 554
548
563
562
565
564
551
550
549
556
557
558
552
555
553
580
581
582
583
585
584
573
574
575
579
576
577
578
567
566
568
572
569
570
571301
336
339
337
338342
340
341 343
318
317
316
313
315
314
303
302
306
304
305
311
312
310
309
308
307
319
320
322
323 321
332
335
334
333
330
331
329
328
327
326
325
324
544
543
542
541
538
536
535
534
533
531
532
537
539
540
347
345
344
346
348
349
408
407
409
410
411
414
415
413
412 406
393
387
385
386
390
388
389
391
392394
397
395
396401
399
400
402398
405
404
403
383
382
384
379
381
380
370
377
376
375
374
373
372
371
369
368 366
367
378
364
365
353
363
362
361
360
359
356
350
358
355
357
514
513351
352
354
502
517
515
510
511
512
504
509
508
505
507
506516
519
518
520
521
522
523
524
528
527
530

Jan T Kim
529
526
525
494
492
491
493
497
495
496
501
499
498
430
429503
500
4
433
454
453
448
450 432
478
45231
451
447
449
437
436435
445
434
443
438
442
441
439
440
446
444
416
418
417
428
427
426
425
424
423419
421
420422
490
489
479
481
480
482
487
488
486
485
483
484
472
473
4744468
476
475
477
471
469
470 67
455
458
457
456
459
460
464
463
461
462
465
466
294
295
291
292
293
300
299
298
297
296
289
290
288282
287
286
285
283
284233
236
237
235
234
243
244
241
242
240
238
239245
248
249
247
246
251
252
250
256
255
253
254
263
266
267
259
260
262
261
265
264
258

dist(dHelix)
257
268
269
270
271
272
276
279
274
280
281
273
275
278
277
232
231225
230
229
227
228
185
184182
183
213
226
224
223
222
217
218
221
220
219

hclust (*, "single")


208
206
207
211
216
212
214
215
single linkage

210
209
196
197
198
200
199201
202
205
203
204
189
190
191
195
194
193
192186
188
187
136
135
138
137
139
140
146154
148
145
147
144
141
143
142
152
153
151
149
150
171
172
169
168
170
178
180
179
181
174
173
175
176
177
156
155
157
158
159
163
160
161
162
165
164
167
166
107
108
109
110126
106 1031
48
131
130
132
133
134
127
128
129
125
124
113
122
123
121
120
114
111
112
115
118
119
116
11797
95
96
99
98
102
101
100
91
92
94
104
10593
84
86
79
78
77
80
81
83
82
85
87
88
8990 68
72
7369
70
66
6760
597
75
71
74 6
615
64
63
62
36
35
31
32 37
Complete, Average and Single Linkage

33
34
645
46
47
4243
40
41
38
39 4
456
50
49
52
51
58
57
54
55
533
41 0
28
24
2529
27 11
23
28
305
6
97
13
1426
12
15222
18
20
17
16 1
19

PCA & Hierarchical Clustering


Complete, Average and Single Linkage

ordered by single linkage

1, ..., 300
301, ..., 600

10
8
6

1.5

y
z

1.0
4

0.5

0.0
2

−0.5

−1.0

−1.5
0

−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

Jan T Kim PCA & Hierarchical Clustering


Heatmap

◮ R: heatmap function
(package stats)

◮ Left dendrogram:
clustering of rows (data
items)

◮ Top dendrogram: clustering


of columns (features)

x7

x4

x2

x6

x5

x1

x3
Jan T Kim PCA & Hierarchical Clustering
Hierarchical Clustering: Summary

◮ Heuristic method for data exploration.


◮ Visual impression of hierarchical structure.
◮ Use on samples of larger data sets.
◮ Operates on dissimilarity values.
◮ Similarity scores need to be transformed.
◮ Outputs a tree of nested clusters.
◮ Tree provides a partial ordering for items.
◮ Does not require interpolation.
◮ Requires dissimilarity matrix as input.
◮ Source data (e.g. data matrix) not required.
◮ Numerous combinations possible of
◮ dissimilarity measures,
◮ clustering criteria.

Jan T Kim PCA & Hierarchical Clustering

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy