Cluster Analysis: Biological Data Analysis and Chemometrics
Cluster Analysis: Biological Data Analysis and Chemometrics
Cluster Analysis: Biological Data Analysis and Chemometrics
Based on H.C. Romesburg: Cluster analysis for researchers, Lifetime Learning Publications, Belmont, CA, 1984 P.H.A. Sneath and R.R. Sokal: Numericxal Taxonomy, Freeman, San Fransisco, CA, 1973
Ordination (projection)
Principal component analysis Correspondence analysis Multidimensional scaling
SAHN clustering
Sequential agglomerative hierarchic nonoverlapping clustering
Single linkage
Nearest neighbor, minimum method Close to minimum spanning tree Contracting space Chaining possible J = 0.5, K = 0.5, = 0, = -0.5 UJ,K = min Ujk
U( J ,K ) L = JUJ ,L + KUK ,L + UJ ,K + UJ ,L UK ,L
Complete linkage
Furthest neighbor, maximum method Dilating space J = 0.5, K = 0.5, = 0, = 0.5 UJ,K = max Ujk
Average linkage
Aritmetic average
Unweighted: UPGMA (group average) Weighted: WPGMA
Centroid
Unweighted centroid (Centroid) Weighted centroid (Median)
Ordinary clustering
Obtain the data matrix Transform or standardize the data matrix Select the best resemblance or distance measure Compute the resemblance matrix Execute the clustering method (often UPGMA = average linkage) Rearrange the data and resemblance matrices Compute the cophenetic correlation coefficient
j i 1
Correlation coefficients
Yule: (ad bc) / (ad + bc)
EUCLID = Eij =
DIST = d ij = 1 n
( xki + xkj )
( x ki + x kj ) 2 k
Chi-squared distance
Cosine coefficient
k xki
k xkj
10 5
20 20
30 10
30 15
5 10
5.00 25.0
25.5
20.6 22.4
5 7.07
18.0 12.7
25.3
2 2
(34)
(15)
12.7 18.0
23.4
(34)
(15)
(15)
21.6
(234)
rX ,Y =
NTSYS
Import matrix Transpose matrix if objects are rows (they are supposed to be columns in NTSYS) (transp in transformation / general) Consider log1 or autoscaling (standardization) Select similarity or distance measure (similarity) Produce similarity matrix
NTSYS (continued)
Select clustering procedure (often UPGMA) (clustering) Calculate cophenetic matrix (clustering) Compare similarity matrix with cophenetic matix (made from the dendrogram) and write down the cophenetic correlation (graphics, matrix comparison) Write dendrogram (graphics, treeplot)