0% found this document useful (0 votes)

2 views84 pages

nnml_ml3

The document provides an overview of Principal Component Analysis (PCA) and Hierarchical Clustering, focusing on statistical characteristics of data such as mean, variance, and standard deviation. It discusses the importance of centering and scaling data for effective analysis, as well as the visualization of numeric and multivariate data. Additionally, it explains covariance, correlation, and the role of eigenvalues and eigenvectors in PCA for dimensionality reduction while preserving variance.

Uploaded by

fawadalikhan918

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views84 pages

nnml_ml3

Uploaded by

fawadalikhan918

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 84

Machine Learning: Principal Component

Analysis, Hierarchical Clustering

Preview Version

Jan T Kim

Computer Science, SPECS, UH

Semester B 2023/24, T-3

Jan T Kim PCA & Hierarchical Clustering

Statistical Characteristics of Data

◮ Mean (average, centre of mass):

1 n
n i∑
x= xi
=1

☞ minimises squared error.

◮ Variance:
n
1
E((x − x )2 ) = σx2 = ( xi − x ) 2
n − 1 i∑
=1

Notation: σxx := σx2

◮ Standard deviation:
q
σx = σx2

Jan T Kim PCA & Hierarchical Clustering

“Centering” of Vectors

◮ The “centred version” of a vector x is

 
x1 − x
..
xc = 
 
. 
xn − x

◮ The mean of a “centred” vector is xc = 0

◮ Centering should be applied to sets (or vectors) containing
elements of the same dimension.
◮ Typically applied to columns (features) of data matrices.
◮ Normally not applicable to rows.

Jan T Kim PCA & Hierarchical Clustering

Scaling to Unit Variance / Standard Deviation

◮ Scaling to unit variance, or unit standard deviation, is

achieved by

1
xs = x
σx
◮ Should be applied to elements of same dimension.
◮ Typically applied to columns (features) of data matrices.
◮ Normally not applicable to rows.

Jan T Kim PCA & Hierarchical Clustering

Centering and Scaling: Demo

original data
4
2
0
y

−2
−4

−4 −2 0 2 4
x

Jan T Kim PCA & Hierarchical Clustering

Centering and Scaling: Demo

original data centred

4
2

2
0

0
y

y
−2

−2
−4

−4

−4 −2 0 2 4 −4 −2 0 2 4
x x

Jan T Kim PCA & Hierarchical Clustering

Centering and Scaling: Demo

original data centred centred and scaled

4
2

2
0

0
y

y
−2

−2

−2
−4

−4

−4
−4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4
x x x

Jan T Kim PCA & Hierarchical Clustering

Centering and Scaling: Demo

original data centred and scaled

2
1.5

1
0
y

−1
0.5

−3
−0.5

−3 −1 0 1 2 −2 0 1 2 3
x x

Jan T Kim PCA & Hierarchical Clustering

Centering and Scaling: Demo

original data centred and scaled

4
2

2
0

0
y

y
−2

−2
−4

−4
−4 −2 0 2 4 −4 −2 0 2 4
x x

☞ Choose axis ranges to visualise relevant features.

Jan T Kim PCA & Hierarchical Clustering

Visualising Numeric Data

x1 x2 x3 x4
1 1 0.41 0.00 1.00
2 2 0.86 0.95 0.31
3 3 1.10 0.59 -0.81
4 4 1.55 -0.59 -0.81
5 5 1.90 -0.95 0.31

Jan T Kim PCA & Hierarchical Clustering

Visualising Numeric Data

x1 x2 x3 x4
1 1 0.41 0.00 1.00 plot(dataFrame);
2 2 0.86 0.95 0.31 0.5 1.0 1.5 −0.5 0.0 0.5 1.0

5
3 3 1.10 0.59 -0.81

4
x1

3
4 4 1.55 -0.59 -0.81

2
1
5 5 1.90 -0.95 0.31

1.5
x2

1.0
0.5

1.0
0.5
x3

0.0
−1.0
1.0
0.5
x4
−0.5 0.0

1 2 3 4 5 −1.0 −0.5 0.0 0.5 1.0

Jan T Kim PCA & Hierarchical Clustering

Visualising Multivariate Data: Planar Projections

3
2
1
0
y

−1
−2
−3

−0.6 −0.4 −0.2 0.0 0.2 0.4 0.6

Jan T Kim PCA & Hierarchical Clustering

Visualising Multivariate Data: Planar Projections

2
1
0
z

−1
−2

−0.6 −0.4 −0.2 0.0 0.2 0.4 0.6

Jan T Kim PCA & Hierarchical Clustering

Visualising Multivariate Data: Planar Projections

−3 −2 −1 0 1 2 3

0.6
0.4
0.2
x

0.0
−0.2
−0.4
−0.6
3
2
1

y
0
−1
−2
−3

2
1
0
z

−1
−2
−0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 −2 −1 0 1 2

Jan T Kim PCA & Hierarchical Clustering

Visualising Multivariate Data: Planar Projections

−3 −2 −1 0 1 2 3

y
3
z

2
1
0
−1
−2
−3
−4
−0.8
−0.6
−0.4
−0.20.00.20.40.60.8
x

Jan T Kim PCA & Hierarchical Clustering

Visualising Multivariate Data: Planar Projections

3
2
1
−3 −2 −1 0
z

y
2
1
0
−1
−2
−3
−3 −2 −1 0 1 2 3
x

Jan T Kim PCA & Hierarchical Clustering

Covariance and Correlation

◮ Covariance:
n
1
n − 1 i∑
cov(x , y ) = σxy = (xi − x )(yi − y )
=1

☞ Invariant to component order:

cov((x1 , x2 ), (y1 , y2 )) = cov((x2 , x1 ), (y2 , y1 ))

◮ Pearson correlation coefficient:

cov(x , y ) ∑n (xi − x )(yi − y )
rxy = = p n i =1 p
σx σy ∑i =1 (xi − x )2 ∑ni=1 (yi − y )2
xc · yc
= = cos ∠(xc , yc )
kxc kkyc k

Jan T Kim PCA & Hierarchical Clustering

Correlation Coefficient: Properties

◮ rxy = 1 only if x = ay with a 6= 0.

◮ rxy = 0 is necessary but not sufficient to establish
independence.
◮ independent ⇒ uncorrelated, but
◮ uncorrelated 6⇒ independent.
1 0.8 0.4 0 -0.4 -0.8 -1

1 1 1 -1 -1 -1

0 0 0 0 0 0 0

https://en.wikipedia.org/wiki/File:Correlation_examples2.svg

Jan T Kim PCA & Hierarchical Clustering

Principal Component Analysis

◮ Explore structure of covariances among features.

◮ Identify direction(s) of largest variance(s).
◮ Reduce dimension of dataset, preserving as much variance
as possible.
◮ Preserve “most important dimensions”.
◮ Remove “noise”.
◮ Preparation for further analysis.

Jan T Kim PCA & Hierarchical Clustering

A Matrix of Points in R2 . . .

 
−1 −1

3
 −0.5 −1 

2
 
 0 − 1 

1
 
 .. .. 

0
y
 . . 
P= 

−1
 0.5 0 
 
 .. .. 

−2
 . . 

−3
 
 0.5 1  −3 −2 −1 0 1 2 3

1 1 x

Jan T Kim PCA & Hierarchical Clustering

. . . Multiplied by a 2 × 2 Matrix S . . .

3
3 0
S=
0 0.8

2
1
Compute and plot

0
y

−1
.. ..
 
. .

−2
 
1.5 0

−3
 
PS = 
 .. ..

 −3 −2 −1 0 1 2 3
 . .  x

3 0.8

◮ Points in P have been scaled.

◮ by a factor of 3 in x–direction
◮ by a factor of 0.8 in y –direction

Jan T Kim PCA & Hierarchical Clustering

Eigenvalues and Eigenvectors

e = (0.5, 0) is mapped to p = (1, 1) is mapped to

Se = (1.5, 0) Sp = (3, 0.8)
◮ There is a λ such that
◮ There is no λ such that
Se = λ e: λ = 3
Sp = λ p.
◮ e is an eigenvector with
eigenvalue λ = 3 3
2
1
0 ◮ p is not an eigenvector
y

−1
−2
−3

−3 −2 −1 0 1 2 3
Jan T Kim x PCA & Hierarchical Clustering
Eigenvalues and Eigenvectors

e = (0.5, 0) is mapped to p = (1, 1) is mapped to

Se = (1.5, 0) Sp = (3, 0.8)
◮ There is a λ such that
◮ There is no λ such that
Se = λ e: λ = 3
Sp = λ p.
◮ e is an eigenvector with
eigenvalue λ = 3 ◮ p is not an eigenvector
1
0
y

−1
−2

Jan T Kim PCA & Hierarchical Clustering

Eigenvalues and Eigenvectors

e = (0.5, 0) is mapped to p = (1, 1) is mapped to

Se = (1.5, 0) Sp = (3, 0.8)
◮ There is a λ such that
◮ There is no λ such that
Se = λ e: λ = 3
Sp = λ p.
◮ e is an eigenvector with
eigenvalue λ = 3 ◮ p is not an eigenvector
1
0
y

−1
−2

Jan T Kim PCA & Hierarchical Clustering

Eigenvalues and Eigenvectors

e = (0.5, 0) is mapped to p = (1, 1) is mapped to

Se = (1.5, 0) Sp = (3, 0.8)
◮ There is a λ such that
◮ There is no λ such that
Se = λ e: λ = 3
Sp = λ p.
◮ e is an eigenvector with
eigenvalue λ = 3 ◮ p is not an eigenvector
1
0
y

−1
−2

Jan T Kim PCA & Hierarchical Clustering

Rotation: Another Matrix . . .

3
0.87 0.5
R=
−0.5 0.87

2
1
Compute and plot

0
y

−1
.. ..
 
. .

−2
 
1.3 0.7

−3
 
PSR = 
 .. ..

 −3 −2 −1 0 1 2 3
 . .  x

2.1 2.4

◮ Points in P have been

◮ scaled by matrix S,
◮ then rotated by π6 by matrix R.

Jan T Kim PCA & Hierarchical Clustering

Rotation: Another Matrix . . .

cos( π6 ) sin( π6 )

3
R=
− sin( π6 ) cos( π6 )

2
1
Compute and plot

0
y

−1
.. ..
 
. .

−2
 
1.3 0.7

−3
 
PSR = 
 .. ..

 −3 −2 −1 0 1 2 3
 . .  x

2.1 2.4

◮ Points in P have been

◮ scaled by matrix S,
◮ then rotated by π6 by matrix R.

Jan T Kim PCA & Hierarchical Clustering

PCA Computation: Covariance Matrix

Compute the covariance matrix

 
cov(x∗,1 , x∗,1 ) cov(x∗,1 , x∗,2 ) · · · cov(x∗,1 , x∗,d )
 cov(x∗,2 , x∗,1 ) cov(x∗,2 , x∗,2 ) · · · cov(x∗,2 , x∗,d ) 
Σ=
 
.. .. .. .. 
 . . . . 
cov(x∗,d , x∗,1 ) cov(x∗,d , x∗,2 ) · · · cov(x∗,d , x∗,d )

◮ x∗,i denotes the i-th column of the dataset X .

◮ The covariance matrix is symmetric (cov(x , y ) = cov(y , x )).
◮ Its eigenvectors are orthogonal.

Jan T Kim PCA & Hierarchical Clustering

Eigenvectors of Symmetric Matrices are Orthogonal

Symmetric matrix: A = AT
◮ Let λi , λj be eigenvalues, λi 6= λj ,
◮ ei , ej corresponding eigenvectors.

λi (ei · ej )
= (Aei ) · ej
= (eiT AT ) · ej
= (eiT A)ej
= eiT (Aej )
= eiT (λj ej )
= λj (ei · ej )

So, (λi − λj )(ei · ej ) = 0 ⇒ ei · ej = 0 and ei and ej are orthogonal.

Jan T Kim PCA & Hierarchical Clustering

PCA Computation: Eigenvectors and Eigenvalues

◮ The d principal components (PCs) of a n × d data matrix

are the eigenvectors if its covariance matrix u1 , . . . , ud .
◮ components of eigenvectors are known as loadings.
◮ PCs are linear combinations of the original dimensions.
◮ PCs have unit length: kui k = 1.
◮ PCs are pairwise orthogonal: ui uj = 0∀i 6= j.
◮ The corresponding eigenvalues λi are equal to the variance
along the corresponding PC.
◮ PCs are ordered by decreasing eigenvalues:

λ1 ≥ λ2 ≥ . . . ≥ λd

Jan T Kim PCA & Hierarchical Clustering

PCA: Variance Conservation in PCs

◮ Total variance in original data is σ12 + σ22 + . . . + σd2 = trace(Σ

Σ)
◮ The trace of a square matrix A is defined to be the sum of the
elements on the main diagonal: trace(A) = ∑ aii .
◮ Total variance of PCs is ∑ λi
◮ According to property of eigenvalues of Σ

∑ λi = trace(ΣΣ)
◮ Covariances:
◮ generally nonzero in original data: cov(x∗,i , x∗,j ) 6= 0
◮ principal components are decorrelated:
cov(p∗,i , p∗,j ) = 0∀i 6= j

Jan T Kim PCA & Hierarchical Clustering

Maximising Variance Preservation

◮ The proportion of variance in PCs 1, . . . , k is

∑ki=1 λi
.
∑di=1 λi
◮ Maximum variance that can be preserved in k < d
dimensions.

Jan T Kim PCA & Hierarchical Clustering

Example PCA Workflow: Dimension Reduction

◮ Pre-processing: Consider
◮ centering columns,
◮ scaling columns to unit variance.
Built into many PCA implementations, check documentation.
◮ Calculate the covariance matrix.
◮ Compute the eigenvectors and eigenvalues of the covariance
matrix.
◮ Select k principal components
◮ Reduce dimensions:
◮ Transform data to PC space.
◮ Project to selected PCs.
◮ Transform back to original space.

Jan T Kim PCA & Hierarchical Clustering

PCA of Lattice Example

Data: X = PSR
3
2
1
0
y

−1
−2
−3

−3 −2 −1 0 1 2 3

Jan T Kim PCA & Hierarchical Clustering

PCA of Lattice Example

PCA results:
◮ Eigenvalues: λ1 = 4.7, λ2 = 0.3, scaling matrix
!
√1 0

λ1 0.46 0
L= =
0 √1 0 1.73
λ 2

0.87 −0.5
◮ Eigenvectors: u1 = , u2 = , combined to
0.5 0.87
matrix

0.87 −0.5
U=
0.5 0.87

Jan T Kim PCA & Hierarchical Clustering

PCA of Lattice Example

3
2

2
1

1
×U
0

0
y

y
=
−1

−1
−2

−2
−3

−3
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3

x x
3

3
2

2
1

1
×L
0

0
y

y
=
−1

−1
−2

−2
−3

−3

−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3

x x

Jan T Kim PCA & Hierarchical Clustering

PCA of Lattice Example

◮ Recall that X = PSR

◮ The eigenvectors “undo” the rotation:

U = R −1

◮ The square roots of eigenvalues “undo” the scaling:

L = const · S −1

Jan T Kim PCA & Hierarchical Clustering

PCA of Lattice Example: Remove PC2

√
′ λ1 0 −1 0.87 −0.5
S = , R=U =
0 0 0.5 0.87
3
2
1

XUL =
0
y

−1
−2
−3

−3 −2 −1 0 1 2 3

Jan T Kim PCA & Hierarchical Clustering

PCA of Lattice Example: Remove PC2

√
′ λ1 0 −1 0.87 −0.5
S = , R=U =
0 0 0.5 0.87
3
2
1

XULS ′ =
0
y

−1
−2
−3

−3 −2 −1 0 1 2 3

Jan T Kim PCA & Hierarchical Clustering

PCA of Lattice Example: Remove PC2

√
′ λ1 0 −1 0.87 −0.5
S = , R=U =
0 0 0.5 0.87

3
2
1

XULS ′ U −1 =
0
y

−1
−2
−3

−3 −2 −1 0 1 2 3

Jan T Kim PCA & Hierarchical Clustering

PCA: Demo 1

raw data
−5 −3 −1 1 −6 −4 −2 0 2

0.3
0.1
x1

−0.1
−0.3
1
−1

x2
−3
−5

0.0
x3

−1.0
−2.0
2
0
−2

x4
−4
−6

2
0
x5

−2
−4
−0.3 −0.1 0.1 0.3 −2.0 −1.0 0.0 −4 −2 0 2

Jan T Kim PCA & Hierarchical Clustering

PCA: Demo 1

raw data

variances per original feature

5
4
3
2
1
0

x1 x2 x3 x4 x5

Jan T Kim PCA & Hierarchical Clustering

PCA: Demo 1

PCA, without scaling to unit variance

−4 −2 0 2 4 −0.3 −0.1 0.1

4
2
PC1

0
−4 −2
4
2

PC2
0
−2
−4

0.1
PC3

−0.1
−0.3
0.1

PC4
−0.1
−0.3

0.0 0.1 0.2

PC5

−0.2
−4 −2 0 2 4 −0.3 −0.1 0.1 −0.2 0.0 0.1 0.2

Jan T Kim PCA & Hierarchical Clustering

PCA: Demo 1

PCA, without scaling to unit variance

−20 −10 0 10 20

0.15
x5 135
137105
111

20
134 112 102
131
109
138 141 139
148
124
120
0.10 123127
128 145
129
104
101 132121
114
147 146
115
125
140 106
144
103
107116 149 119

10
944 136 117
143
113122
130 150
126
118
0.05

x2 12
4624 18 1531
111 110 108
142
26
49
2223
39 35 34
30
5013
2125 8
27 133
17 3 33x3
14
20 x1
0.00

16 28

0
48
PC2

47 45 36
32
2738641 37
4 42
29 89 52
5 4010
−0.05

1943 65 6095
7670 77

−10
86 84 9363
7188 67
92 97 62
94
66 545564
69
−0.10

58 9098 5773
8153
56
x4 74
7991 6185 72 83

−20
59 82100
75
78 68
−0.15

96
51 80
8799

−0.15 −0.10 −0.05 0.00 0.05 0.10 0.15

PC1

Jan T Kim PCA & Hierarchical Clustering

PCA: Demo 1

PCA, without scaling to unit variance

7
6
5
4
3
2
1
0 variances per PC without scaling

PC1 PC2 PC3 PC4 PC5

Jan T Kim PCA & Hierarchical Clustering

PCA: Demo 1

PCA, without scaling to unit variance

−3 −1 0 1 2 3 −0.2 0.0 0.2

3
2
1
PC1

−1 0
−3
3
2
1

PC2
−1 0
−3

1.5
0.5
PC3

−0.5
−1.5
0.2

PC4
0.0
−0.2

0.05
PC5

−0.05
−0.15
−3 −1 0 1 2 3 −1.5 −0.5 0.5 1.5 −0.15 −0.05 0.05

Jan T Kim PCA & Hierarchical Clustering

PCA: Demo 1

PCA, without scaling to unit variance

variances per PC with scaling

2.5
2.0
1.5
1.0
0.5
0.0

PC1 PC2 PC3 PC4 PC5

Jan T Kim PCA & Hierarchical Clustering

PCA: Demo 2

−20 −10 0 10 20 −20 −10 0 10 20

20
10
x1

0
−20 −10
20
10

x2
0
−20 −10

20
10
x3

0
−20 −10
20
10

x4
0
−20 −10

−20 −10 0 10 20 −20 −10 0 10 20

Jan T Kim PCA & Hierarchical Clustering

PCA: Demo 2

−20 −10 0 10 20 −20 −10 0 10 20

20
10
PC1

0
−20 −10
20
10

PC2
0
−20 −10

20
10
PC3

0
−20 −10
20
10

PC4
0
−20 −10

−20 −10 0 10 20 −20 −10 0 10 20

Jan T Kim PCA & Hierarchical Clustering

PCA: Demo 2

20
10
PC2

0
−10
−20

−20 −10 0 10 20

PC1

Jan T Kim PCA & Hierarchical Clustering

Principal Component Analysis (PCA): Summary

◮ PCs form a basis of the d-dimensional data space.

◮ PCs are a set of orthogonal unit vectors.
◮ Linear transformation from original basis into PC basis.
◮ Invertible operation: Data can be transformed back to original
dimensions.
◮ PCs are ordered by decreasing variance.
◮ Choosing top PCs achieves maximal preservation of
variance in k < d dimensions.
◮ Applications include
◮ dimension reduction,
◮ removal of “noise”,
◮ preparation for further analysis.

Jan T Kim PCA & Hierarchical Clustering

Distances

◮ A function d (x , y ) is called a distance metric if it is

◮ non-negative: d (x , y ) ≥ 0,
◮ 0 only for identical values: d (x , y ) = 0 ⇔ x = y
◮ pseudometric: For some x 6= y, d (x , y ) = 0
◮ symmetric: d (x , y ) = d (y , x ),
◮ satisfying the triangular inequality d (x , y ) ≤ d (x , z ) + d (z , y )
◮ Distance metrics quantify dissimilarity.
◮ Larger values indicate less similarity.
☞ Not all dissimilarities / divergences are distances.

Jan T Kim PCA & Hierarchical Clustering

Distances: Examples

q √
◮ L2 norm (Euclidean norm): kx k2 = ∑i xi2 = 16 + 9 = 5
◮ L1 norm (Manhattan norm): kx k1 = ∑i |xi | = 4 + 3 = 7
◮ L∞ norm: kx k∞ = maxi |xi | = max{4, 3} = 4

Jan T Kim PCA & Hierarchical Clustering

Correlation-Based Dissimilarity

Pearson correlation coefficient:

cov(x , y ) ∑n (xi − x )(yi − y ) xc yc
rxy = = p n i =1 p n =
σx σy 2
∑i =1 (xi − x ) ∑i =1 (yi − y ) 2 kxc kkyc k

◮ Correlation is a similarity measure.

◮ Larger values indicate more similarity.
◮ Dissimilarity can be quantified by
◮ 1 − rxy
2
◮ 1 − rxy

Jan T Kim PCA & Hierarchical Clustering

Visualising Numeric Data

x1 x2 x3 x4
1 1 0.41 0.00 1.00
2 2 0.86 0.95 0.31
3 3 1.10 0.59 -0.81
4 4 1.55 -0.59 -0.81
5 5 1.90 -0.95 0.31

Jan T Kim PCA & Hierarchical Clustering

Visualising Numeric Data

x1 x2 x3 x4
1 1 0.41 0.00 1.00 plot(dataFrame);
2 2 0.86 0.95 0.31 0.5 1.0 1.5 −0.5 0.0 0.5 1.0

5
3 3 1.10 0.59 -0.81

4
x1

3
4 4 1.55 -0.59 -0.81

2
1
5 5 1.90 -0.95 0.31

1.5
x2

1.0
0.5

1.0
0.5
x3

0.0
−1.0
1.0
0.5
x4
−0.5 0.0

1 2 3 4 5 −1.0 −0.5 0.0 0.5 1.0

Jan T Kim PCA & Hierarchical Clustering

Visualising Numeric Data

x1 x2 x3 x4
1 1 0.41 0.00 1.00 plot(dataFrame);
2 2 0.86 0.95 0.31 0.5 1.0 1.5 −0.5 0.0 0.5 1.0

5
3 3 1.10 0.59 -0.81

4
x1

3
4 4 1.55 -0.59 -0.81

2
1
5 5 1.90 -0.95 0.31

1.5
x2

1.0
0.5

1.0
image(matrix2D);

0.5
x3

0.0
−1.0
1.0
0.8

0.5
x4
−0.5 0.0
0.4

1 2 3 4 5 −1.0 −0.5 0.0 0.5 1.0

0.0

0.0 0.2 0.4 0.6 0.8 1.0

Jan T Kim PCA & Hierarchical Clustering

Hierarchical Clustering

◮ General idea: Find hierarchical structures in the data.

◮ Algorithmic idea:
◮ Consider a distance matrix.
◮ Iteratively group closest clusters.

Jan T Kim PCA & Hierarchical Clustering

Hierarchical Clustering: Algorithm

◮ Input: a n × n distance matrix (d (x , y )).

◮ Place each data item in a one-member cluster.
◮ Let the cluster distance between one-item clusters be the
distance between the items: dc (i , j ) = dij
◮ Repeat
◮ Find clusters i , j with smallest distance.
◮ Merge clusters i , j to new cluster (ij ).
◮ Let (dc ((ij ), k ) = min{dc (i , k ), dc (j , k )}∀k (Single linkage)
◮ Remove rows and columns i and j.
◮ until a cluster contains all data.