0% found this document useful (0 votes)

86 views104 pages

Clustering

This document discusses pattern classification using minimum-distance functions. It begins by explaining that this method works best when pattern classes tend to cluster tightly around typical patterns. It then provides examples of patterns that can and cannot be easily classified with this proximity concept. The document goes on to describe how minimum-distance classifiers work by assigning patterns to the closest prototype pattern. It discusses using single and multiple prototypes to represent classes. It also covers distance measures, clustering criteria, cluster-seeking algorithms, and a simple example algorithm. Overall, the document provides an overview of pattern classification based on measuring distances to prototype patterns.

Uploaded by

Soumyadip Ghosh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

86 views104 pages

Clustering

Uploaded by

Soumyadip Ghosh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 104

Pattern Recognition:

Clustering

Pattern Classification by
Distance Functions
The method of pattern classification by
distance functions can be expected to yield
practical and satisfactory results only when
the pattern classes tend to have clustering
properties

Contd..

Figure 1. Patterns classifiable by

proximity concept

Contd..

Figure 2. Patterns not easily classifiable

by proximity concept

Contd..
Unlike Fig. 2, in Fig. 1, the patterns of each
class tend to cluster tightly about a typical or
representative pattern for the class. This
occurs in case where pattern variability and
other corruptive influences are well behaved.
A typical example is the problem of reading
bank checks by machine.

Contd..
Here the proximity of an unknown pattern to
the patterns of a class will serve as a measure
for its classification, so the method is called
minimum-distance pattern classification. And
the system which does this is called minimumdistance classifiers.

Single Prototypes
In this case, the patterns of each class tend to
cluster tightly about a typical or
representative pattern for that class. Under
these conditions, minimum-distance classifiers
can constitute a very effective approach to the
classification problem.

Contd..
Consider M pattern classes and assume that
these classes are representable by prototype
patterns Z 1,Z 2,..., Z M . The Euclidean distance
between an arbitrary pattern vector x and the
ith prototype is given by,
D i X Z i ( X Z i )( X Z i )

.(1)

Contd..
A minimum-distance classifier computes the
distance from a pattern x of unknown
classification to the prototype of each class,
and assigns the pattern to the class to which it
is closest.

Contd..
In other words, x is assigned to class wi if
Di < Dj for all ij. Ties are resolved arbitrarily.
Equation (1) may be expressed as
D i 2 X Z i

( X Z i ) ( X Z i )

Contd..

X X 2 X Z iZ i Z i

1
X X 2( X Z i Z i Z i )
2

Contd..
Choosing the minimum Di2 is equivalent to
choosing the minimum Di since all distances
are positive, since the term XtX is independent
of i, choosing the minimum Dt2 is equivalent to
choosing the maximum (XtZi 1/2xZitZi)
Consequently, we may define the decision
functions..

Contd..
1
d i ( X ) X Z i Z i Z i ,
2

i 1,2,..., M

where x is assigned to class

for all j i.

.(2)

, if di ( x) d j ( x)

Observe that d i ( X ) is a linear decision

function; that is, if Z ij , j 1,2,..., n, are the
components of Z i , and we let..

Contd..
wij z ij , j 1,2,..., n w i.n1 1 Z i Z i
2

and

x2
.
X
.

.x n
1

then we may express Eq. (2) in the familiar

linear form

d i ( X ) W i X , i 1,2,..., M where W i ( w i1 ,wi 2 ,...., wi.n1 ).

Contd..
The decision boundary of a two-class
example in which each class is characterized
by a single prototype is shown below.

Figure 3. Decision boundary of two classes

characterized by single prototype

Contd..
It can be shown that the linear decision
surface separating every pair of prototype
points zi and zj is the hyperplane which is the
perpendicular bisector of the line segment
joining the two points. Thus the minimumdistance-classifiers are a special case of linear
classifiers.

Multiprototypes
Here each class is characterized by several
prototypes, that is, each pattern of class wi
tends to cluster about one of the prototypes
Z i 1, Z i 2 ,..., Z i N , where N i is the number of
prototypes in the ith pattern class.
i

Let the distance function between an

arbitrary pattern x and class wi be denoted by
D i min X Z i l , l 1,2,..., N i

Contd..
Following the development for single prototype,
the decision functions

1 l
d i ( X ) max{( X Z i ) ( Z i )Z i l }, l 1,2,..., N i
2
where, as before, x is placed in class w i ,if
d i ( X ) d j ( X ), for all j i.
l

The decision boundaries for a two-class case

in which each class contains two prototypes
are shown in the next slide.

Contd..

Figure 2. Piecewise-linear decision boundaries

for two classes, each of which is characterized
by two prototypes

Contd..
Note that the boundaries between classes w i
w j and are piecewise linear. Since we could
have defined this as a single-prototype, fourclass problem, the sections of the boundaries
are the perpendicular bisectors of the lines
joining the prototypes of different classes.

Contd..
Although general iterative algorithms exist
which can be used in the calculation of linear
decision function parameters, unfortunately,
no truly general algorithm is yet known for the
piecewise-linear classifier decision function.

Cluster Seeking
It is evident that the ability to determine
characteristic prototypes or cluster centers
in a given set of data plays a central role in
the design of pattern classifiers based on
the minimum-distance concept. So we need
to study various cluster-seeking methods.

Contd..
Note that the performance of a given
algorithm is not only dependent on the type of
data being analyzed, but is also strongly
influenced by the chosen measure of pattern
similarity

Measures of Similarity
To define a data cluster, it is necessary to first
define a measure of similarity which will
establish a rule for assigning patterns to the
domain of a particular cluster center. Earlier
we considered the Euclidean distance between
two patterns x and z:

D X Z
as a measure of their similarity----the smaller
the distance, the greater the similarity.

Contd..
There are, however, other meaningful
distance measures which are sometimes
useful. For example, the Mahalanobis distance
from x to m.

D ( X m)C 1 ( X m) (1)

Contd..
It is a useful measure of similarity when
statistical properties are being explicitly
considered. Here C is the covariance matrix of
a pattern population, m is the mean vector,
and x represents a variable pattern.

Contd..
Measures of similarity need not be restricted
to distance measures. For example, the nonmetric similarity function:
XZ
S(X , Z)
.(2)
X Z

which is the cosine of the angle between

the vectors x and z, is maximum when
x and z are oriented in the same direction
with respect to the origin.

Contd..
This measure of similarity is useful when
cluster regions tend to develop along
principal axes, as shown below.
xz 1
s( x, z 1 ) cos 1
x z2

xz 2
s( x, z 2 ) cos 2
x z2
Figure 4. Illustration of a similarity measure

Contd..
Here the use of this similarity measure is
governed by certain qualifications, such as
sufficient separation of cluster regions with
respect to each other as well as with respect
to the coordinate system origin.

Contd..
When the patterns under consideration are
binary valued with 0, 1 elements, the
similarity function of Eq. (2) may be given an
interesting non-geometrical interpretation.

Contd..
We say that a binary pattern x possesses the
ith attribute if x Then the term XtZ in Eq. (2) is
simply the number of attributes shared by x
and z, while X Z ( X X )(Z Z ) is the geometric
mean of the number of attributes possessed
by x and the number possessed by z. In this
case the similarity function s ( X , Z ) is,
therefore, seen to be a measure of common
attributes possessed by the binary vectors x
and z.

Contd..
A binary variation of Eq. (2) which has been
widely used in information retrieval, nosology
(classification of diseases), and taxonomy
(classification of plants and animals) is the socalled Tanimoto measure, which is given by

XZ
S(X , Z)
X X Z Z X Z

Clustering Criteria
After a measure of pattern similarity has been
adopted, we are still faced with the problem of
specifying a procedure for partitioning the given
data into cluster domains.

Contd..
The clustering criterion used may represent a
heuristic scheme, or it may be based on the
minimization (or maximization) of a certain
performance index. The heuristic approach is
guided by intuition and experience.

Contd..
The Euclidean distance is generally used as
measure of proximity. A suitable threshold is
also necessary in order to define degrees of
acceptable similarity in the cluster-seeking
process.

Contd..
The performance-index (sometimes called
objective function) approach is guided by the
development of a procedure which will
minimize or maximize the chosen performance
index. One of the most often used indices is
the sum of the squared errors index, given
by..

Contd..
NC

J
j 1

X S j

X m j

where N C is the number of clusters, S j is the

set of samples belonging to the jth cluster,
and
1
mj
X

N j X S j

Contd..

is the sample mean vector of set S j . N j

represents the number of samples in Sj. The
index J represents the overall sum of the
squared errors between the samples of a
cluster domain and their corresponding mean.
The K-means clustering algorithm is based on
this performance index.

Contd..
Other common performance indices are :
the average squared distances between
samples in a cluster domain
the average squared distances between
samples in different cluster domains
indices based on the scatter matrix concept,
minimum and maximum-variance indices
a score of other performance measures which
have been used throughout the years.

Contd..
It is not uncommon to find a cluster-seeking
algorithm that represents a combination of the
heuristic and performance index approaches.
The Isodata clustering algorithm is such a
combination.

A Simple Cluster-Seeking
Algorithm

Suppose that we have a set of N sample

patterns X 1, X 2,..., X N . Let the first cluster
center Z1=X1(or any other pattern) and select
an arbitrary nonnegative threshold T.
Next, we compute the distance D 21from X 2 to Z 2
If this distance exceeds T, a new cluster
center, Z 2 X 2, is started. Otherwise, we assign
X 2 to the domain of cluster center Z 1.Suppose
that D 21 T so that Z 2 is established. In the next
step, the distances D 31 and D 32 from X 3 to Z 1
and Z 2 are computed.

Contd..
If both D 31 and D 32 are greater than T, a new
cluster center, Z 3 X 3, is created. Otherwise,
we assign X 3 to the domain of the cluster
center to which it is closest. This process is
repeated for all patterns.

Contd..

Figure 5. Effect of threshold in clustering

Contd..

The results of the foregoing procedure

depend on
the first cluster center chosen,
the order in which the patterns are
considered,
the value of T, and,
the geometrical properties of the data.

Contd..

Some intuitive idea for choosing a suitable

value of T may be based on
MST of data points
Intra-cluster distance
Inter-cluster distance
The closest and farthest points in cluster from
the cluster center and
the variance of these cluster domains

Contd..
This procedure may be expected to yield
useful results in situations where the data
exhibit characteristic "pockets" which are
reasonably well separated with respect to the
chosen values of the threshold.

Maximin-Distance
Algorithm
The maximum-distance algorithm is another
simple heuristic procedure based on the
Euclidean distance concept. Consider the ten
two-dimensional samples shown in the Figure
below.

Contd..

The table is initially empty. Then, in the first step,

we arbitrarily let X 1 become the first cluster
center, designated by Z 1 .

Contd..

Next, we determine the farthest sample from

X1 ,
which in this case is X 6 , and call it cluster center Z 2
In the third step we compute the distance from
each remaining sample to Z 1 and Z 2 . For every pair
of these computations we save the minimum
distance. Then, we select the maximum of these
minimum distances. If this distance is an appreciable
fraction (say, at least half or more), of the distance
between cluster centers Z 1 and Z 2 we call the Z 3
corresponding sample cluster center . Otherwise
the algorithm is terminated.

K-means Algorithm
Clustering is an unsupervised technique
used in discovering inherent structure
present in the set of patterns. In fact,
clustering techniques aim to extract the
groups present in a given data set and
each such group is termed as a cluster.
Let the set of patterns be
S x1 , x2 , ...., xn m , where x iis the ith
pattern vector, n is the total number of
patterns and m is the dimensionality of
the feature space.

Contd..
Let the number of clusters be K . If the
clusters are represented by C1 , C2 , ........ , CK
then we assume
.

P1. C i , for i 1, 2, ........ , K

P2. Ci C j for i j and

P3. iK1 Ci S where represents null set.

Contd..
Clustering techniques may broadly be divided
into two categories: Hierarchical and nonhierarchical. The non-hierarchical or partitional
clustering problem deals with obtaining an
optimal partition of into subsets such that
some clustering criterion is satisfied.

Contd..
Among the partitional clustering techniques,
the -Means algorithm has been one of the
most widely used algorithms. Here the value
of K needs to be known a priori. The principle
used for clustering by K-means algorithm is to
minimize the sum of intraclass distances to
get the optimal clusters. Mathematically this
principle has been stated below.

Contd..
1.Let C1 , C2 , ........ , C K be a set of K clusters of S

2.Let

x
xC
j

zj
#Cj

for j . 1,2,..... K . where # C j

represents the number of points in C j

Contd..
3.Let

f (C1 , C2 , ........ , CK ) x z j

j 1 xC j

f (C1 , C2 , ........ , CK ) is referred as the objective

function of the clustering C1 , C2 , ........ , C K

4. Minimize

f (C1 , C2 , ........ , CK )

over all such

C1 , C2 , ........ , C K where C1 , C2 , ........ , C K

satisfy P1, P2 and P3 stated earlier.

Contd..
All possible clusterings of S are to be
considered to get the optimal C1 , C 2 ,......, C k .
So obtaining the exact solution of the problem
is theoretically possible, yet not feasible in
practice due to limitations of computer storage
and time. One requires the evaluation of
S (n, k ) partitions if exhaustive enumeration is
used to solve the problem, where
1 k
k j k n
S (n, k ) (1) j .
k! j 1
j

This clearly indicates that exhaustive

enumeration cannot lead to the required
solution for most practical problems in
reasonable computation time. For example,
for the crude-oil data, the exact solution
requires the examination of S (56,3) 1018
partitions.

Contd..
Thus, approximate heuristic techniques
seeking a compromise or looking for an
acceptable solution have usually been
adopted. One such method is the Forgy's Kmeans algorithm.

Algorithm K-means
Step 1 : Select an initial cluster configuration.
Repeat
Step 2 : Calculate cluster centers z j , j 1,2,..... K .
of the existing groups.
Step 3 : Redistribute patterns among clusters
utilizing the minimum squared Euclidean distance
classifier concept2 :
2
xi C jif xi z j xi zl l {1,2,.... K }, l j
Until (there is no change in cluster centers)
End

Contd..

The behavior of the K-means algorithm is

influenced by
the number of cluster centers specified,
the choice of initial cluster centers,
the order in which the samples are taken, and,
the geometrical properties of the data.

Contd..
Although no general proof of convergence
exists for this algorithm, it can be expected to
yield acceptable results when the data exhibit
characteristic pockets which are relatively far
from each other.

Isodata Algorithm
The Isodata (Iterative Self-Organizing Data
Analysis) is similar in principle to the K-means
procedure in the sense that cluster centers are
iteratively determined sample means. Unlike
the latter algorithm, however, Isodata
represents a fairly comprehensive set of
additional heuristic procedures which have
been incorporated into an interactive scheme.
Before executing the algorithm it is necessary
to specify a set N C of initial cluster centers
Z 1,Z 2,..., Z N . For a set of N samples, { X 1, X 2,..., X N },
Isodata consists of the following steps..
C

Contd..
Step 1. Specify the following process
parameters:
K = number of cluster centers desired;
N = a parameter against which the number of
samples in a cluster domain is compared;
s = standard deviation parameter;
C = lumping parameter;
L = maximum number of pairs of cluster
centers which can be lumped;
I = number of iterations allowed.

Contd..
Step 2. Distribute the N samples among the
present cluster centers, using the relation
X S j if

X Z j X Z i , i 1,2,..., N C ; i j

for all x in the sample set. In this notation,

S j represents the subset of samples assigned to
cluster center Z j .
Step 3. Discard sample subsets with fewer
than N members; that is, if for any j, N j N ,
discard S j and reduce N C by 1.

Contd..
Step 4. Update each cluster center by
Z j , j 1,2,..., N C , setting it equal to the sample
mean of its corresponding set S j ; that is,
1
Z j
Nj

X S j

j 1,2,..., N C

where N j is the number of samples in S j

Step 5. Compute the average distance D j
of samples in cluster domain S j from their
corresponding cluster center, using the
relation
1
D j

X S j

X Z j ,

j 1,2,..., N C

Contd..
Step 6. Compute the overall average
distance of the samples from their
respective cluster centers, using the
relation

1 NC
D N jD j
N j 1

Contd..
Step 7.
(a) If this is the last iteration, set C 0
and go to Step 11.
(b) If N C K / 2, go to Step 8.
(c) If this is an even-numbered iteration,
or if N C 2K , go to Step 11; otherwise,
continue

Contd..
Step 8. Find the standard deviation vector
j ( 1 j, 2 j,..., nj) for each sample subset, using
the relation 1
2
ij

X S j

Z ij ) , i 1,2,..., n;

j 1,2,..., N C

where n is the sample dimensionality,x ik is the

ith component of the kth sample in S j , Z ij is the
ith component of Z j , and N j is the number of
samples in S j Each component of j represents
the standard deviation of the samples in S j
along a principal coordinate axis.

Contd..

Step 9. Find the maximum component of each

j, j 1,2,..., N C and denote it by
. j max
Step 10. If for any j max, j 1,2,..., N C , we have j max S

and

(a) D j D and N j 2( N1) or

(b) N C K / 2

Contd..
then split Z j into two new cluster centers Z j and Z j
, delete Z j , and increase N C by 1.
Cluster center Z j is formed by adding a given
quantity y j to the component of Z j which
corresponds to the maximum component of j

Z j is formed by subtracting y j from the same

component of Z j. One way of specifying y j is to let
it be equal to some fraction of j max, that is,
y j k j max, where 0 k 1.

Contd..
The basic requirement in choosing yj is that it
be sufficient to provide a detectable difference
in the distance from an arbitrary sample to the
two new cluster centers, but not so large as to
change the overall cluster domain
arrangement appreciably.
If splitting took place in this step, go to Step
2; otherwise continue.

Contd..
Step 11. Compute the pairwise distances D ij
between all cluster centers :
D ij Z i Z j , i 1,2,..., N C 1; j i 1,..., N C

Contd..
Step 12. Compare the distances D ij against
the parameter C . Arrange the L smallest
distances which are less than C in
ascending order:
[ D i1 j1 , D i 2 j 2 ,..., D iLjL ]
where Di j Di j DiL jL and L is the maximum
number of pairs of cluster centers which can
be lumped together. The lumping process is
discussed in the next step.
1

Contd..
Step 13. With each distance D iljl there is
associated a pair of cluster centers Z il and Z jl
Starting with the smallest of these distances,
perform a pairwise lumping operation according
to the following rule :
For l 1,2,..., L, if neither Z il nor Z jl has been used
in lumping in this iteration, merge these two
cluster centers using the following relation:
1
Zl
[ N il( Z il) N jl ( Z jl)]
N il N jl

Delete Z il and Z jl and reduce N C by 1.

Contd..
It is noted that only pairwise lumping is
allowed and that a lumped cluster center is
obtained by weighting each old cluster center
by the number of samples in its domain.
Experimental evidence indicates that more
complex lumping can produce unsatisfactory
results. The above procedure makes the
lumped cluster centers representative of the
true average point of the combined subsets. It
is also important to note that, since a cluster
center can be lumped only once, this step will
not always result in L lumped centers.

Contd..
Step 14. If this is the last iteration, the
algorithm terminates. Otherwise go to Step
1 if any of the process parameters requires
changing at the user's discretion, or go to
Step 2 if the parameters are to remain the
same for the next iteration. An iteration is
counted every time the procedure returns to
Step 1 or 2.

Example
Example: Let the patterns be {(0,0), (1,1),
(2,2), (4,3), (4,4), (5,3), (5,4), (6,5)}
In this case N 8 and n 2 . Suppose that
we initially let N C 1, Z 1 (0,0),and specify the
following parameters:
Step 1.K 2, N 1, S 1, C 4, L 0,

I 4

Contd..
If no a priori information on the data being
analyzed is available, these parameters are
arbitrarily chosen and then adjusted during
successive iterations through the algorithm.
Step 2. Since there is only one cluster
center,
S 1 {X 1, X 2,,..., X 8} and

N 1 8

Step 3. Since N 1 N,no subsets are discarded.

Contd..
Step 4. Update the cluster centers
1
Z 1
N1

3.38

X S 1
2.75

Step 5. ComputeD j :
1
D 1
N1

X S 1

X Z 1 2.26

Contd..
Step 6. Compute D in this case
D D1 2.26
,

Step 7. Since this is not the last iteration and

N c K / 2 Go to step 8.

Step 8. Find the standard deviation vector for S1

1.99

1
1.56

Contd..
Step 9. The maximum component of 1 is
1.99 ; hence, 1 max 1.99.
Step 10. Since 1 max S and N C K / 2, we
split Z 1 into two new clusters. Following the
procedure described in Step 10, suppose
that we let y j 0.5 j max 1.0. Then,
4.38
,
Z 1
2.75

2.38

Z 1
2.75

Contd..
For convenience these two cluster centers
are renamed Z 1 and Z 2 respectively. Also, N C
is increased by 1. Go to Step 2.
Step 2. The sample sets are now

S 1 { X 4, X 5, X 6, X 7, X 8},

S 2 { X 1, X 2, X 3}

and N 1 5, N 2 3.
Step 3. Since both N 1 and N 2 are greater
than N, no subsets are discarded.

Contd..
Step 4. Update the cluster centers :
1
Z 1
N1

4.80
,
X

X S 1
3.80

1
Z 2
N2

1.00

X S 2 1.00

Step 5. Compute D j , j 1,2 :

1
D 1
N1

X S 1

X Z 1 0.80,

1
D 2
N2

X S 2

X Z 2 0.94

Contd..
Step 6. Compute D :
1 NC
1 2
D N jD j N jD j 0.85
N j 1
8 j 1

Step 7. Since this is an even-numbered

iteration, condition (c) of Step 7 is satisfied.
Therefore, go to Step 11.

Contd..
Step 11. Compute the pairwise distances:
D12 Z 1 Z 2 4.72
Step 12. Compare D12 to C. In this case, D12 C.
Step 13. From the results of Step 12, we see
that no lumping of cluster centers can take
place.
Step 14. Since this is not the last iteration,
we are faced with the decision of whether or
not to make an alteration in the parameters.
Since, in this simple example,

Contd..
1.
2.

we have obtained the desired number of

clusters,
their separation is greater than the average
spread indicated by the standard deviations,
and
each cluster subset contains a significant
percentage of the total number of samples,
we arrive at the conclusion that the cluster
centers are representative of the data.
Therefore, we return to Step 2.

Contd..
Steps 2-6 : yield the same results as in the
previous iteration
Step 7. None of the conditions in this step is
satisfied. Therefore, we proceed to Step 8.
Step 8. Compute the standard deviation of sets

S 1 { X 4, X 5, X 6, X 7, X 8} and S 2 { X 1, X 2, X 3} :
0.75
,
1
0.75

0.82

2
0.82

Contd..
Step 9. In this case 1max 0.75 and 2 max 0.82.
Step 10. The conditions for splitting are not
satisfied. Therefore, we proceed to Step 11.
Step 11. We obtain the same result as in the
previous iteration:
D12 Z 1 Z 2 4.72

Step 12. We obtain the same result as in the

Previous iteration.

Contd..
Step 13. We obtain the same result as in
the previous iteration.
Step 14. Nothing new has been added in
this iteration, except the computation of the
standard deviation vectors. Therefore, we
return to Step 2.
Steps 26. yield the same results as in
the previous iteration.
Step 7. Since this is the last iteration, we
set C 0 and go to Step ll.

Contd..
Step 11. D12 Z 1Z 2 4.72 as before.
Step 12. We obtain the same result as in
the previous iteration.
Step 13. From the results of Step 12, we
see that no lumping can take place.
Step 14. Since this is the last iteration, the
algorithm is terminated

Contd..
It should be clear, even from the above
simple example, that the application of
Isodata to a set of moderately complex
data requires, in general, extensive
experimentation before one can arrive at
meaningful conclusions. However, by
properly organizing the information
obtained in each iteration, it is possible to
gain considerable insight into the structure
of the data.

Evaluation of Clustering
Results
The principal difficulty in evaluating the
results of clustering algorithms is inability to
visualize the geometrical properties of a
high-dimensional space. Several
interpretation techniques exist which allow
at least partial insight into the geometrical
properties of the resulting cluster domains.

Contd..
A very useful interpretation tool is the distance
between cluster centers This information is
best presented in the form of a table.

Contd..
Cluster
Center

Z1
Z2

0.0

4.8

14.7 2.1

0.0

21.1 6.1 48.3

50.6

0.0 15.0 36.7

0.0 49.3

0 .0

Table 1. Distance Table for Interpreting

Clustering Results

Contd..
Cluster center Z 5 is far removed from the
other cluster centers. If it is known that this
cluster center is associated with numerous
samples, we would accept it as a valid
description of the data. otherwise we might
dismiss this cluster center as representative
of noise samples.

Contd..
If two cluster centers are relatively close,
and one of the centers is associated with a
much larger number of samples, it is often
possible to merge the two cluster domains.
The variances of a cluster domain about its
mean can be used to infer the relative
distribution of the samples in the domain.

Contd..
Cluster
Domains

Variances

1 .2

0 .9

0 .7

2 .0

1 .3

1. 5

0 .9

3 .7 4 .8

7 .3

10.4

0.3

0.8

0.7

1.1

4.2 5.4

18.3

3.3

1 .0

Table 2. Variance Table for Interpreting Clustering Results

Contd..
It is assumed that each variance
component is along the direction of one of
the coordinate axes.
Note :
Since domain S1 has very similar variances,
it can be expected to be roughly spherical in
nature.
Cluster domain S5 on the other hand, is
significantly elongated about the third
coordinate axis. A similar analysis can be
carried out for the other domains.

Contd..
This information, coupled with the distance
table and sample numbers, can be of
significant value in interpreting clustering
results.

Graph-Theoretic Approach
Till now, the clusters are determined in such a way
that the intraset distance among each cluster is kept
to a minimum, and the interset distances between
two clusters are made as large as possible.

Contd..
An alternative approach to cluster seeking is to make
use of some basic notion in graph theory. In this
approach, a pattern graph is first constructed from
the given sample patterns, which form the nodes of
the graph. A node j is connected to a node k by an
edge, if the patterns corresponding to these two
nodes are similar or are related.

Contd..
Pattern X j and pattern X k are said to be similar if the
similarity measure S ( X j, X k) is greater than a prespecified threshold T. The similarity measure may be
used to generate a similarity matrix S, whose
elements are 0 or 1. The similarity matrix provides a
systematic way to construct the pattern graph. Since
cliques of a pattern graph form the clusters of the
patterns, cluster seeking can be accomplished by
detecting the cliques of the pattern graph.

Contd..
Several clique detection algorithms and
programs have been introduced in the
literature. Several such methods exist that
utilize the MST of the data points to get
the clustering.

Thank You

Differential Calculus Midterms Tutorial
No ratings yet
Differential Calculus Midterms Tutorial
2 pages
K Means Clustering
100% (1)
K Means Clustering
13 pages
Math 215 Sample Differential Equations Final Solutions 17-01-2020 Ev
No ratings yet
Math 215 Sample Differential Equations Final Solutions 17-01-2020 Ev
7 pages
Calculus of Residues
80% (5)
Calculus of Residues
15 pages
dpp
No ratings yet
dpp
3 pages
Group Theory and Hydrogen Atom
100% (1)
Group Theory and Hydrogen Atom
16 pages
Clustering
No ratings yet
Clustering
104 pages
Lesson 3
No ratings yet
Lesson 3
23 pages
Differential Calculus
No ratings yet
Differential Calculus
4 pages
1.3 (Day 3) Homework-1
No ratings yet
1.3 (Day 3) Homework-1
4 pages
Interval Notation
No ratings yet
Interval Notation
11 pages
IAP2s Foundations of Public Participation
No ratings yet
IAP2s Foundations of Public Participation
4 pages
Calculus Ii: Unit 4: Triple Integrals
No ratings yet
Calculus Ii: Unit 4: Triple Integrals
53 pages
Asymptotic Stability Theorem For Autonomous Systems - Mukherjee Chen
No ratings yet
Asymptotic Stability Theorem For Autonomous Systems - Mukherjee Chen
3 pages
Unit 2
No ratings yet
Unit 2
89 pages
Clustering Slides
No ratings yet
Clustering Slides
22 pages
Download Full Basic Algebra Anthony W. Knapp PDF All Chapters
100% (1)
Download Full Basic Algebra Anthony W. Knapp PDF All Chapters
55 pages
Comparison Tests For Infinite Series - tcm6-61153
No ratings yet
Comparison Tests For Infinite Series - tcm6-61153
13 pages
G Math Chapter 9 G2 53
No ratings yet
G Math Chapter 9 G2 53
53 pages
Ds Module 5
No ratings yet
Ds Module 5
49 pages
The President and The 2020 Election
No ratings yet
The President and The 2020 Election
29 pages
Solving Equations 1.1 Bisection Method 1
No ratings yet
Solving Equations 1.1 Bisection Method 1
44 pages
The Psalms of King David - Part 1 by Lazer Gurkow
No ratings yet
The Psalms of King David - Part 1 by Lazer Gurkow
1 page
Archived Problems - Project Euler
No ratings yet
Archived Problems - Project Euler
3 pages
10 2 Guide Notes SE Simplifying Radicals
No ratings yet
10 2 Guide Notes SE Simplifying Radicals
3 pages
How To Use DGL
No ratings yet
How To Use DGL
5 pages
Swarm Intelligence
No ratings yet
Swarm Intelligence
14 pages
K. Maheswaran MCA, M.Phil., M.SC, (Maths), SET, PH.D
No ratings yet
K. Maheswaran MCA, M.Phil., M.SC, (Maths), SET, PH.D
48 pages
Fuzzy Logic
No ratings yet
Fuzzy Logic
9 pages
Integration-Solved Ex. & All Exercise Module-5
No ratings yet
Integration-Solved Ex. & All Exercise Module-5
10 pages
Module 4 ML
No ratings yet
Module 4 ML
11 pages
Today's Interconnected Financial Markets Demand New Trading Strategies
No ratings yet
Today's Interconnected Financial Markets Demand New Trading Strategies
55 pages
K Means Algorithm
No ratings yet
K Means Algorithm
6 pages
Application of Differentiation
No ratings yet
Application of Differentiation
4 pages
Asic ATH: Luis Anthony Ast
No ratings yet
Asic ATH: Luis Anthony Ast
10 pages
Chapter 5: Application: Fourier Series
No ratings yet
Chapter 5: Application: Fourier Series
6 pages
Worksheet 6 Properties of Integers
No ratings yet
Worksheet 6 Properties of Integers
11 pages
Numerical Optimization: Basic Concepts and Algorithms: R. Duvigneau
No ratings yet
Numerical Optimization: Basic Concepts and Algorithms: R. Duvigneau
34 pages
Wilson Relative Price Channel
No ratings yet
Wilson Relative Price Channel
1 page
Mathematical Intuition
No ratings yet
Mathematical Intuition
23 pages
Time-Dependent Hurst Exponent in Financial Time Series
No ratings yet
Time-Dependent Hurst Exponent in Financial Time Series
5 pages
Sidereal Zodiac For Kanpur, Uttar Pradesh, India
No ratings yet
Sidereal Zodiac For Kanpur, Uttar Pradesh, India
3 pages
Circle Geometry Solutions
No ratings yet
Circle Geometry Solutions
5 pages
Pattern Recognition - Clustering - Classification
No ratings yet
Pattern Recognition - Clustering - Classification
177 pages
Business Cycles PDF
No ratings yet
Business Cycles PDF
30 pages
Jaipur National University: Project Design With Seminar
100% (1)
Jaipur National University: Project Design With Seminar
26 pages
Unit Circle Discovery Project
No ratings yet
Unit Circle Discovery Project
3 pages
A Tutorial on ν-Support Vector Machines: 1 An Introductory Example
No ratings yet
A Tutorial on ν-Support Vector Machines: 1 An Introductory Example
29 pages
Ploting Rpart Tree With PRP
No ratings yet
Ploting Rpart Tree With PRP
28 pages
Stock Trading Examples
100% (1)
Stock Trading Examples
4 pages
6.laplace Transforms
No ratings yet
6.laplace Transforms
36 pages
GANN
No ratings yet
GANN
4 pages
Array Calc 10 D
No ratings yet
Array Calc 10 D
40 pages
Ensign Manual: Draw Tools
No ratings yet
Ensign Manual: Draw Tools
58 pages
K-Means Clustering Algorithm: - V - ' Is The Euclidean Distance Between X ' Is The Number of Data Points in I
No ratings yet
K-Means Clustering Algorithm: - V - ' Is The Euclidean Distance Between X ' Is The Number of Data Points in I
3 pages
7 - Hidden Pairs A.K.A. Hidden Matching Pairs - Solving A More Diffic - Full-HD
No ratings yet
7 - Hidden Pairs A.K.A. Hidden Matching Pairs - Solving A More Diffic - Full-HD
46 pages
Functions PDF
No ratings yet
Functions PDF
73 pages
Genesis: Series: Studies On The Go by Laurie Polich-Short
No ratings yet
Genesis: Series: Studies On The Go by Laurie Polich-Short
18 pages
Flexible Pavement Design With Mechanistic
No ratings yet
Flexible Pavement Design With Mechanistic
201 pages
First Course in Time Series Analysis
No ratings yet
First Course in Time Series Analysis
364 pages
05.stock Market Prediction Using
No ratings yet
05.stock Market Prediction Using
4 pages
To Learn Mathematics-Mayan Mathematics
No ratings yet
To Learn Mathematics-Mayan Mathematics
10 pages
The Influence of Jupiter Mars and Venus On Earths
No ratings yet
The Influence of Jupiter Mars and Venus On Earths
21 pages
Business Cycles 12.2
No ratings yet
Business Cycles 12.2
14 pages
Introduction To RPART
No ratings yet
Introduction To RPART
67 pages
MPRA Paper 52540
No ratings yet
MPRA Paper 52540
16 pages
Sololearn C#
No ratings yet
Sololearn C#
40 pages
Spectrum Current
No ratings yet
Spectrum Current
1 page
Combinatorics & Counting
No ratings yet
Combinatorics & Counting
22 pages
AJ2014 09ResAstMikeHarding
No ratings yet
AJ2014 09ResAstMikeHarding
6 pages
Principal Component Analysis Notes : Info
No ratings yet
Principal Component Analysis Notes : Info
22 pages
07 Part2 Chapter5 Evaluation
No ratings yet
07 Part2 Chapter5 Evaluation
78 pages
Real Analysis
100% (1)
Real Analysis
284 pages
The Average Distance in A Random Graph
No ratings yet
The Average Distance in A Random Graph
22 pages
Forex Trading For Beginners
No ratings yet
Forex Trading For Beginners
3 pages
Demand Forecasting
No ratings yet
Demand Forecasting
23 pages
Clustering
0% (1)
Clustering
127 pages
Predicting Stock Prices With Echo State Networks - Towards Data Science
No ratings yet
Predicting Stock Prices With Echo State Networks - Towards Data Science
19 pages
Astronomical Statistics: Introductory Notes
No ratings yet
Astronomical Statistics: Introductory Notes
54 pages
1-Advanced Fibonacci Techniques & Studies
No ratings yet
1-Advanced Fibonacci Techniques & Studies
42 pages
Ancient Methods of Calculating Square Roots
No ratings yet
Ancient Methods of Calculating Square Roots
7 pages
Wavelets: A Second Breath For Models Based On Fixed Cycles
No ratings yet
Wavelets: A Second Breath For Models Based On Fixed Cycles
44 pages
Fourier Analysis For Stock Price Forecasting: Assumption and Evidence
No ratings yet
Fourier Analysis For Stock Price Forecasting: Assumption and Evidence
16 pages
Forecasting and How To Trade PDF
No ratings yet
Forecasting and How To Trade PDF
22 pages
Today's Value for Pi Is Incorrect
From Everand
Today's Value for Pi Is Incorrect
John Rose
No ratings yet
Mirror Code: Balance Words With Numbers
From Everand
Mirror Code: Balance Words With Numbers
Michael Green
No ratings yet
New Product Review-Market Analyst Software-Gann Analyst Program
No ratings yet
New Product Review-Market Analyst Software-Gann Analyst Program
7 pages
Appliedpsycholo01psycgoog PDF
No ratings yet
Appliedpsycholo01psycgoog PDF
170 pages
Fundamentals of Prosperity
From Everand
Fundamentals of Prosperity
Roger Babson
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Clustering

Uploaded by

Clustering

Uploaded by

Pattern Recognition:

Figure 1. Patterns classifiable by

Figure 2. Patterns not easily classifiable

where x is assigned to class

Observe that d i ( X ) is a linear decision

then we may express Eq. (2) in the familiar

d i ( X ) W i X , i 1,2,..., M where W i ( w i1 ,wi 2 ,...., wi.n1 ).

Figure 3. Decision boundary of two classes

Let the distance function between an

The decision boundaries for a two-class case

Figure 2. Piecewise-linear decision boundaries

which is the cosine of the angle between

where N C is the number of clusters, S j is the

is the sample mean vector of set S j . N j

Suppose that we have a set of N sample

Figure 5. Effect of threshold in clustering

The results of the foregoing procedure

Some intuitive idea for choosing a suitable

The table is initially empty. Then, in the first step,

Next, we determine the farthest sample from

P1. C i , for i 1, 2, ........ , K

P3. iK1 Ci S where represents null set.

for j . 1,2,..... K . where # C j

represents the number of points in C j

f (C1 , C2 , ........ , CK ) is referred as the objective

function of the clustering C1 , C2 , ........ , C K

over all such

C1 , C2 , ........ , C K where C1 , C2 , ........ , C K

satisfy P1, P2 and P3 stated earlier.

This clearly indicates that exhaustive

The behavior of the K-means algorithm is

for all x in the sample set. In this notation,

where N j is the number of samples in S j

where n is the sample dimensionality,x ik is the

Step 9. Find the maximum component of each

(a) D j D and N j 2( N1) or

Z j is formed by subtracting y j from the same

Delete Z il and Z jl and reduce N C by 1.

Step 3. Since N 1 N,no subsets are discarded.

Step 7. Since this is not the last iteration and

Step 8. Find the standard deviation vector for S1

Step 5. Compute D j , j 1,2 :

Step 7. Since this is an even-numbered

we have obtained the desired number of

Step 12. We obtain the same result as in the

21.1 6.1 48.3

0.0 15.0 36.7

Table 1. Distance Table for Interpreting

Table 2. Variance Table for Interpreting Clustering Results

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.