0% found this document useful (0 votes)

5 views41 pages

ppt7

The document discusses clustering as an unsupervised learning technique used to group similar records into clusters without a target variable. It covers various clustering methods, including hierarchical (agglomerative and divisive) and non-hierarchical (k-means) clustering, detailing their processes and applications. Additionally, it emphasizes the importance of distance metrics, normalization, and the goal of maximizing within-cluster similarity while minimizing between-cluster similarity.

Uploaded by

hrithiks435

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views41 pages

ppt7

Uploaded by

hrithiks435

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

Cluster Analysis

Hierarchical & k-means

Clustering Task
• Clustering belongs to the class of Unsupervised Learning refereeing to uncover grouping of records, observations,
or cases into classes of similar objects
• A cluster is a collection of records that are similar to one another and dissimilar to records in other cluster
• Clustering differs from classification in that there is no target variable for clustering so, the clustering task does
not try to classify, estimate, or predict the value of a target variable
• Clustering algorithms seek to segment the entire data set into relatively homogeneous subgroups or clusters, where
the similarity of the records within the cluster is maximized, and the similarity to records outside this cluster is
minimized
• Some examples of Cluster Analysis are
• Target marketing of a niche product for a small-capitalization business that does not have a large marketing
budget
• For accounting auditing purposes, to segment financial behavior into benign and suspicious categories
• As a dimension-reduction tool when a data set has hundreds of attributes
• For gene expression clustering, where very large quantities of genes may exhibit similar behavior
Clustering Task
• Clustering is often performed as a preliminary step in a data mining process, with the resulting clusters being used
as further inputs into a different technique downstream, such as neural networks
• Owing to the enormous sizes of many present-day databases, it is often helpful to apply clustering analysis first, to
reduce the search space for the downstream algorithms
• Cluster analysis encounters many of the same issues that we dealt with in classification. For example, we shall
need to determine
• how to measure similarity
• how to recode categorical variables
• how to standardize or normalize numerical variables
• how many clusters we expect to uncover
Prerequisites of Clustering
• Clustering uncovers the groups in objects based on similarity or dissimilarity between them measured by
Distance metrics
• Primarily four types of distances are widely used

• For optimal performance, clustering algorithms, just like algorithms for classification, require the data to be
normalized so that no particular variable or subset of variables dominates the analysis
• Analysts may use either the min–max normalization or Z-score standardization
Goal of Clustering
• All clustering methods have as their goal the identification of groups of records such that similarity within a group
is very high while the similarity to records in other groups is very low
• Clustering algorithms seek to construct clusters of records such that the between-cluster variation is large compared
to the within-cluster variation Analogous to the objective of Analysis of Variance
Hierarchical Clustering
• Clustering algorithms are either hierarchical or nonhierarchical
• In hierarchical clustering, a treelike cluster structure (dendrogram) is created through recursive partitioning
(divisive methods) or combining (agglomerative) of existing clusters
• Agglomerative clustering methods initialize each observation to be a tiny cluster of its own. Then, in succeeding
steps, the two closest clusters are aggregated into a new combined cluster the number of clusters in the data set is
reduced by one at each step and finally a huge cluster is obtained containing all the points
• Divisive clustering methods begin with all the records in one big cluster, with the most dissimilar records being
split off recursively, into a separate cluster, until each record represents its own cluster
• Our focus will be on Agglomerative Clustering because of its widespread applications than Divisive Clustering
• Nonhierarchical Clustering works in a different fashion than hierarchical in which most popular are k-means,
k-median etc types of Clustering
Hierarchical Agglomerative Clustering
• Distance computation between records is rather straightforward once appropriate recoding and normalization has
taken place
• But how do we determine distance between clusters of records? Should we consider two clusters to be close if
their nearest neighbors are close or if their farthest neighbors are close? How about criteria that average out these
extremes? Three criteria for determining distance between clusters and records
• Single linkage, sometimes termed the nearest-neighbor approach, is based on the minimum distance between any
record in cluster A and any record in cluster B. In other words, cluster similarity is based on the similarity of the
most similar members from each cluster. Single linkage tends to form long, slender clusters, which may
sometimes lead to heterogeneous records being clustered together
• Complete linkage, sometimes termed the farthest-neighbor approach, is based on the maximum distance between
any record in cluster A and any record in cluster B. In other words, cluster similarity is based on the similarity of
the most dissimilar members from each cluster. Complete linkage tends to form more compact, sphere like
clusters
• Average linkage is designed to reduce the dependence of the cluster-linkage criterion on extreme values, such
as the most similar or dissimilar records. In average linkage, the criterion is the average distance of all the
records in cluster A from all the records in cluster B. The resulting clusters tend to have approximately
equal within-cluster variability
Hierarchical Agglomerative Clustering Algorithm
A Worked Example
• Consider the following n = 8 bivariate points

• Using Euclidean distance, the upper-triangular portion of the symmetric, (8×8)- matrix D(1) is as follows

• Scatterplot of these points is

A Worked Example – Single Linkage
• The smallest dissimilarity is d12 = d23 = d68 = 1.414. We choose to merge x2 and x3 to form the new cluster “23.”
We next compute new dissimilarities, d23,K = min{d2K,d3K} for K = 1,4,5,6,7,8. The (7 × 7)-matrix D(2) is given by
the following

• The smallest dissimilarity is d1,23 = d68 = 1.414. We choose to merge x1 with the “23” cluster, producing a new
cluster “123.” We next compute new dissimilarities, d123,K = min{d12,K,d3K} for K = 4,5,6,7,8. The (6 × 6)-matrix
D(3) is as follows
A Worked Example – Single Linkage
• The smallest dissimilarity is d68 = 1.414, and so we merge x6 and x8 to form the new cluster “68.” We compute new
dissimilarities, d68,K =min{d6K,d8K} for K = 123,4,5,7. This gives us the (5 × 5)-matrix D(4)

• The smallest dissimilarity is d45 = 2.0, and so we merge x4 andx5 to form the new cluster “45.” We compute new
dissimilarities, d45,K = min{d4K,d5K} for K = 123,68,7. This gives the (4 × 4)-matrix D(5)
A Worked Example – Single Linkage
• The smallest dissimilarity is d45,68 = d68,7 = 2.236. We choose to merge the cluster “68” with x7 to produce the new
cluster “678.” The new dissimilarities, d678,K = min{d68,K,d7K} for K = 123,45, yield the matrix D(6)

• The smallest dissimilarity is d45,678 = 2.236, so the next merge is the cluster “45” with the cluster “678.” The matrix
D(7) is

• The last merge is cluster “123” with cluster “45678,” and the merging dissimilarity is d123,45678 = 3.162
A Worked Example – Single Linkage
• The corresponding Dendogram is given by
A Worked Example – Complete Linkage
• From D(1) given previously, we merge x2 and x3 to form the “23” cluster at height 1.414, as before. So, the
upper-triangular portion of the (7 × 7)-matrix D(2) is as
follows

• The smallest dissimilarity is d68 = 1.414. We merge x6 and x8 to form a new cluster “68.” We compute new
dissimilarities, d68,K = max{d6K,d8K} for K = 1,23,4,5,7. This gives us a (6 × 6)-matrix D(3)
A Worked Example – Complete Linkage
• The smallest dissimilarity is d1,23 = d45 = 2.0. We choose to merge the cluster “23” with x1 to form a new cluster
“123.” We compute new dissimilarities, d123,K = max{d1,K,d23,K} for K = 4,5,68,7. This gives us a new (5 ×
5)-matrix D(4)

• The smallest dissimilarity is d45 = 2.0. We merge x4 and x5 to form a new cluster “45.” We compute dissimilarities,
d45,K = max{d4K,d5K} for K = 123,68,7. This gives us a new (4 × 4)-matrix D(5)

• The smallest dissimilarity is d68,7 = 2.236. We merge cluster “68” with x7 to form the new cluster “678.” New
dissimilarities d678,K = max{d68,K,d7K} are computed for K = 123,45 to give the new (3 × 3)-matrix D(6)
A Worked Example – Complete Linkage
• The smallest dissimilarity is d68,7 = 2.236. We merge cluster “68” with x7 to form the new cluster “678.” New
dissimilarities d678,K = max{d68,K,d7K} are computed for K = 123,45 to give the new (3 × 3)-matrix D(6)

• The last steps merge the clusters “45” and “678” with a merging value of d45,678 = 5.385, and then the clusters
“123” and “45678” with a merging value of d123,45678 = 7.280
A Worked Example – Complete Linkage
• The Dendogram for this method is shown below
A Worked Example – Average Linkage
• We start with the matrix D(1). The smallest dissimilarity is d12 = √2 = 1.414, and so we merge x1 and x2 to form
cluster “12.”
• We compute dissimilarities between the cluster “12” and all other points using the average distance, d12,K = (d1K +
d2K)/2, for K = 3,4,5,6,7,8. For example, d12,3 = (d13 + d23)/2 = (√4 + √2)/2 = 1.707
• The matrix D(2) is given by

• The smallest dissimilarity is d68 = 1.414, and so we merge x6 and x8 to form the new cluster “68.” We compute
dissimilarities between the cluster “68” and all other points and clusters using the average distance, d68,12 = (d16+
d26 + d18 + d28)/4 = 6.364, and d68,K = (d6K + d8K)/2, for K = 3,4,5,7
• The matrix D(3) is
A Worked Example – Average Linkage
• The smallest dissimilarity is d12,3 = 1.707, and so we merge x3 and the cluster “12” to form the new cluster “123.”
We compute dissimilarities between the cluster “123” and all other points using the average distance, d123,68 = (d16
+ d18 + d26 + d28 + d36 + d38)/6 = 5.974 and d123,K = (d1K + d2K + d3K)/3, for K = 4,5,7.
• This gives the matrix D(4)

• The smallest dissimilarity is d45 = 2.0, and so we merge x4 and x5 to form the new cluster “45.” We compute
dissimilarities between the cluster “45” and the other clusters as before
• This gives the matrix D(5)
A Worked Example – Average Linkage
• The smallest dissimilarity is d68,7 = 2.236, and so we merge x7 and the cluster “68” to form the new cluster “678.”
This gives the matrix D(6)

• The smallest dissimilarity is d45,678 = 3.792, and so we merge the two clusters “45” and “678” to form a new cluster
“45678.” We merge the last two clusters and compute their dissimilarity d123,45678 = 4.940. The corresponding
Dendogram is
k-means Clustering
• The k-means clustering algorithm is a straightforward and effective algorithm for
finding clusters in data. The algorithm proceeds as follows:
• Step 1: Ask the user how many clusters k the data set should be partitioned into.
• Step 2: Randomly assign k records to be the initial cluster center locations.
• Step 3: For each record, find the nearest cluster center. Thus, in a sense, each cluster center “owns” a subset of
the records, thereby representing a partition of the data set. We therefore have k clusters, C1, C2, … , Ck
• Step 4: For each of the k clusters, find the cluster centroid, and update the location of each cluster center to the
new value of the centroid
• Step 5: Repeat steps 3–5 until convergence or termination
• The “nearest” criterion in step 3 is usually Euclidean distance, although other criteria may be applied as well.
• The cluster centroid in step 4 is found as follows. Suppose that we have n data points (a1, b1, c1), (a2, b2, c2), … ,
(an, bn, cn), the centroid of these points is the center of gravity of these points and is located at point (∑ ai⁄n,∑ bi⁄n,
∑ ci⁄n,)
• For example, the points (1,1,1), (1,2,1), (1,3,1), and (2,1,1) would have centroid
k-means Clustering
• The algorithm terminates when the centroids no longer change. In other words, the algorithm terminates when for all
clusters C1, C2, … , Ck, all the records “owned” by each cluster center remain in that cluster
• Alternatively, the algorithm may terminate when some convergence criterion is met, such as no significant shrinkage in
the mean squared error (MSE) where SSE represents the sum of squares error, p ∈ Ci represents each data point in
cluster i, mi represents the centroid (cluster center) of cluster i, N is the total sample size, and k is the number of clusters

• Recall that clustering algorithms seek to construct clusters of records such that the between-cluster variation is large
compared to the within-cluster variation
• Because this concept is analogous to the analysis of variance, we may define a pseudo-F statistic as follows

• MSB is the mean square between, and SSB is the sum of squares between clusters, defined as following, with where n i is
the number of records in cluster i, mi is the centroid (cluster center) for cluster i, and M is the grand mean of all the data
k-means Clustering
• MSB represents the between-cluster variation and MSE represents the within-cluster variation
• Thus, a “good” cluster would have a large value of the pseudo-F statistic, representing a situation where the
between-cluster variation is large compared to the within-cluster variation
• Hence, as the k-means algorithm proceeds, and the quality of the clusters increases, we would expect MSB to
increase, MSE to decrease, and F to increase
k-means Clustering– Worked Example
• Suppose that we have the eight data points in two-dimensional space shown in Table and plotted in Figure and are
interested in uncovering k = 2 clusters
k-means Clustering– Worked Example
• Let us apply the k-means algorithm step by step.
• Step 1: Ask the user how many clusters k the data set should be partitioned into. We have already indicated that
we are interested in k = 2 clusters
• Step 2: Randomly assign k records to be the initial cluster center locations. For this example, we assign the
cluster centers to be m1 = (1,1) and m2 = (2,1)
• Step 3 (first pass): For each record, find the nearest cluster center contains the (rounded) Euclidean distances
between each point and each cluster center m1 = (1,1) and m2 = (2,1), along with an indication of which cluster
center the point is nearest to. Therefore, cluster 1 contains points {a,e,g}, and cluster 2 contains points
{b,c,d,f,h}.

• Step 4 (first pass): For each of the k clusters find the cluster centroid and update the location of each cluster
center to the new value of the centroid. The centroid for cluster 1 is [(1 + 1 + 1)/3, (3 + 2 + 1)/3] = (1,2). The
centroid for cluster 2 is [(3 + 4 + 5 + 4 + 2)/5, (3 + 3 + 3 + 2 + 1)/5] = (3.6, 2.4)
k-means Clustering– Worked Example
• The clusters and centroids (triangles) at the end of the first pass are shown in following figure. Note that m1 has
moved up to the center of the three points in cluster 1, while m2 has moved up and to the right a considerable
distance, to the center of the five points in cluster 2

• Step 5: Repeat steps 3 and 4 until convergence or termination. The centroids

have moved, so we go back to step 3 for our second pass through the algorithm
k-means Clustering– Worked Example
• Step 3 (second pass): For each record, find the nearest cluster center. Following table shows the distances
between each point and each updated cluster center m1 = (1,2) and m2 = (3.6, 2.4), together with the resulting
cluster membership. There has been a shift of a single record (h) from cluster 2 to cluster 1. The relatively
large change in m2 has left record h now closer to m1 than to m2, so that record h now belongs to cluster 1. All
other records remain in the same clusters as previously. Therefore, cluster 1 is {a,e,g,h}, and cluster 2 is
{b,c,d,f}

• Step 4 (second pass): For each of the k clusters, find the cluster centroid and update the location of each cluster
center to the new value of the centroid. The new centroid for cluster 1 is [(1 + 1 + 1 + 2)/4, (3 + 2 + 1 + 1)/4] =
(1.25, 1.75). The new centroid for cluster 2 is [(3 + 4 + 5 + 4)/4, (3 + 3 + 3 + 2)/4] = (4, 2.75). The clusters and
centroids at the end of the second pass are shown in following figure. Centroids m1 and m2 have both moved
slightly
k-means Clustering– Worked Example

• Step 5: Repeat steps 3 and 4 until convergence or termination. As the centroids have moved, we once again
return to step 3 for our third (and as it turns out, final) pass through the algorithm
k-means Clustering– Worked Example
• Step 3 (third pass): For each record, find the nearest cluster center. Following table shows the distances
between each point and each newly updated cluster center m1 = (1.25, 1.75) and m2 = (4, 2.75), together with
the resulting cluster membership. Note that no records have shifted cluster membership from the preceding
pass
• Step 4 (third pass): For each of the k clusters, find the cluster centroid and
update the location of each cluster center to the new value of the centroid. As
no records have shifted cluster membership, the cluster centroids therefore also
remain unchanged
• Step 5: Repeat steps 3 and 4 until convergence or termination. As the centroids
remain unchanged, the algorithm terminates
• After third pass the nearest cluster centers
k-means Clustering– Worked Example
• Lets’ observe the behavior of the statistics MSB, MSE and pseudo-F after step 4 of each pass
• First pass
k-means Clustering– Worked Example
• In general, we would expect MSB to increase, MSE to decrease, and F to increase
• Second pass ∶ MSB = 17.125, MSE = 1.313333, F = 13.03934.
Third pass ∶ MSB = 17.125, MSE = 1.041667, F = 16.44.
• Note that the k-means algorithm cannot guarantee finding the global maximum pseudo-F statistic, instead
often settling at a local maximum. To improve the probability of achieving a global minimum, the analyst may
consider using a variety of initial cluster centers. One suggestion is (i) placing the first cluster center on a
random data point, and (ii) placing the subsequent cluster centers on points as far away from previous centers as
possible
• One potential problem for applying the k-means algorithm is: Who decides how many clusters to search for?
Unless the analyst has a priori knowledge of the number of underlying clusters; therefore, an “outer loop”
should be added to the algorithm, which cycles through various promising values of k. Clustering solutions for each
value of k can therefore be compared, with the value of k resulting in the largest F statistic being selected Some
algorithms such as the BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies) clustering
algorithm, can select the optimal number of clusters
• What if some attributes are more relevant than others to the problem formulation? As cluster membership is
determined by distance, we may apply the same axis- stretching methods for quantifying attribute relevance
Rationale for measuring cluster goodness
• Every modeling technique requires an evaluation phase
• For example, we may work hard to develop a multiple regression model for predicting the amount of money to be spent
on a new car. But, if the standard error of the estimate s for this regression model is $100,000, then the usefulness of the
regression model is questionable
• In the classification realm, we would expect that a model predicting who will respond to our direct-mail marketing
operation will yield more profitable results than the baseline “send-a-coupon-to-everybody” or
“send-out-no-coupons-at-all” models
• In a similar way, clustering models need to be evaluated as well. Some of the questions of interest might be the following:
• Do my clusters actually correspond to reality, or are they simply artifacts of mathematical convenience?
• I am not sure how many clusters there are in the data. What is the optimal number of clusters to identify?
• How do I measure whether one set of clusters is preferable to another?
• Two methods for measuring cluster goodness, the silhouette method, and the pseudo-F statistic are introduced here
Rationale for measuring cluster goodness
• Any measure of cluster goodness, or cluster quality, should address the concepts of cluster separation as well as cluster
cohesion
• Cluster separation represents how distant the clusters are from each other; cluster cohesion refers to how tightly related
the records within the individual clusters are
• Good measures of cluster quality need to incorporate both criteria. For example, we have seen that the sum of squares
error (SSE) is a good measure of cluster quality
• However, by measuring the distance between each record and its cluster center, SSE accounts only for cluster cohesion
and does not account for cluster separation Thus, SSE is monotonically decreasing as the number of clusters
increases, which is not a desired property of a valid measure of cluster goodness
• Of course, both the silhouette method and the pseudo-F statistic account for both cluster cohesion and cluster separation
The Silhoutte Method
• For each data value i,

where ai is the distance between the data value and its cluster center, and bi is the distance between the data value and
the next closest cluster center
• The silhouette value is used to gauge how good the cluster assignment is for that particular point
• A positive value indicates that the assignment is good, with higher values being better than lower values
• A value that is close to zero is considered to be a weak assignment, as the observation could have been assigned to the
next closest cluster with limited negative consequence
• A negative silhouette value is considered to be misclassified, as assignment to the next closest cluster would have been
better
• Note how the definition of silhouette accounts for both separation and cohesion. The value of ai represents cohesion, as it
measures the distance between the data value and its cluster center, while bi represents separation, as it measures the distance
between the data value and a different cluster
The Silhoutte Method
• Each of the data values in Cluster 1 have their values of ai and bi represented by solid lines and dotted lines, respectively.
Clearly, bi > ai for each data value, as represented by the longer dotted lines
• Thus, each data value’s silhouette value is positive, indicating that the data values have not been misclassified. The dotted
lines indicate separation, and the solid lines indicate cohesion
• Taking the average silhouette value over all records yields a useful measure of how well the cluster solution fits the
data. The following thumbnail interpretation of average silhouette is meant as a guideline only, and should bow before
the expertise of the domain expert
• INTERPRETATION OF AVERAGE SILHOUETTE VALUE
• 0.5 or better. Good evidence of the reality of the clusters in the data
• 0.25–0.5. Some evidence of the reality of the clusters in the data. Hopefully, domain-specific knowledge can be brought
to bear to support the reality of the clusters
• Less than 0.25. Scant evidence of cluster reality
The Silhoutte Method - Example
• Suppose we apply k-means clustering to the following little one-dimensional data set:
x1 = 0, x2 = 2, x3 = 4, x4 = 6, x5 = 10
• k-means assigns the first three data values to Cluster 1 and the last two to Cluster 2

• The cluster center for Cluster 1 is m1 = 2, and the cluster center for Cluster 2 is m2 = 8
• The values for ai represent the distance between the data value xi and the cluster center to which xi belongs. The values
for bi represent the distance between the data value and the other cluster center. Note that a2 = 0 because a2 = m1 = 2
The Silhoutte Method - Example
• Following table contains the calculations for the individual data value silhouettes, along with the mean silhouette. Using
our rule of thumb, mean silhouette = 0.7 represents good evidence of the reality of the clusters in the data. Note that x2 is
perfectly classified as belonging to Cluster 1, as it sits right on the cluster center m1; thus, its silhouette value is a perfect
1.00. However, x3 is somewhat farther from its own cluster center, and somewhat closer to the other cluster center; hence,
its silhouette value is lower, 0.50
Silhoutte Analysis of the IRIS Data
• The data set consists of 150 observations of three species of Iris, along with measurements of their petal width, petal length,
sepal width, and sepal length
• Left figure shows a scatter plot of petal width versus petal length, with an overlay of Iris species. (Note that min–max
normalization is used.) It shows that one species is well separated, but the other two are not, at least in this dimension

• So, one question we could ask of these Irises: True there are three species in the data set, but are there really three clusters in
the data set, or only two?
• It makes sense to begin with k = 3 clusters. k-means clustering was applied to the Iris data, asking for k = 3 clusters. A logical
question might be: Do the clusters match perfectly with the species? (Of course, the species type was not included as input to the
clustering algorithm.)
• The answer is, not quite - most of the Iris virginica belong to Cluster 2, but some belong to Cluster 3. And most of the Iris
versicolor belong to Cluster 3, but some belong to Cluster 2 (From right figure)
Silhoutte Analysis of the IRIS Data
• The silhouette values for each flower were calculated, and graphed in the silhouette plot in the following figure

• This silhouette plot shows the silhouette values, sorted from highest to lowest, for each cluster. Cluster 1 is the
best-defined cluster, as most of its silhouette values are rather high. However, Clusters 2 and 3 have some records with
high silhouette and some records with low silhouette. However, there are no records with negative silhouette, which
would indicate the wrong cluster assignment
• The mean silhouette values for each cluster, and the overall mean silhouette, are provided in the table. These values
support our suggestion that, although Cluster 1 is well-defined, Clusters 2 and 3 are not so well-defined. This makes
sense, in light of what we learned in the last figures
Silhoutte Analysis of the IRIS Data
• Many of the low silhouette values for Clusters 2 and 3 come from the boundary area between their respective clusters.
Evidence for this is shown in this figure

• The silhouette values were binned (for illustrative purposes): A silhouette value below 0.5 is low; a silhouette value at
least 0.5 is high. The lower silhouette values in this boundary area result from the proximity of the “other” cluster center.
This holds down the value of bi, and thus of the silhouette value
• it is worth noting that the clusters were formed using four predictors, but we are examining scatter plots of two predictors
only. This represents a projection of the predictor space down to two dimensions, and so loses some of the information
available in four dimensions
Silhoutte Analysis of the IRIS Data
• Next, k-means was applied, with k = 2 clusters. This clustering combines I. versicolor and I. virginica into a single cluster, as
shown in left figure. The silhouette plot for k = 2 clusters is shown in right figure. There seem to be fewer low silhouette values
than for k = 3 clusters. This is supported by the mean silhouette values reported in the table. The overall mean silhouette is 17%
higher than for k = 3, and the cluster mean silhouettes are higher as well

• So, it is clear that the silhouette method prefers the clustering model where k = 2. This is fine, but just be aware that the k = 2
solution recognizes no distinction between I. versicolor and I. virginica, whereas the k = 3 solution does recognize this
distinction. Such a distinction may be important to the client

OTG Journal GB
No ratings yet
OTG Journal GB
10 pages
Agglomerative Hierarchical Clustering Algorithm-A Review: K.Sasirekha, P.Baby
No ratings yet
Agglomerative Hierarchical Clustering Algorithm-A Review: K.Sasirekha, P.Baby
3 pages
My Lecture On CLUSTER ANALYSIS PDF
No ratings yet
My Lecture On CLUSTER ANALYSIS PDF
55 pages
Methods of Soil Analysis Part 3 Chemical Methods
0% (4)
Methods of Soil Analysis Part 3 Chemical Methods
5 pages
Lecture 8 Clustring
No ratings yet
Lecture 8 Clustring
16 pages
RK Clustering
No ratings yet
RK Clustering
77 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
32 pages
13 Clustering and Classifier
No ratings yet
13 Clustering and Classifier
123 pages
MA Unit 5
No ratings yet
MA Unit 5
7 pages
07 Hierarchical Clustering
No ratings yet
07 Hierarchical Clustering
19 pages
Topic 6d - Hierarchical Algorithm
No ratings yet
Topic 6d - Hierarchical Algorithm
38 pages
19_Clustering in operation research
No ratings yet
19_Clustering in operation research
11 pages
Agglomerative Clustering
No ratings yet
Agglomerative Clustering
6 pages
Cluster Analysis
No ratings yet
Cluster Analysis
12 pages
Lecture-11 Cluster Analysis-1
No ratings yet
Lecture-11 Cluster Analysis-1
28 pages
Data Mining Functionalities
No ratings yet
Data Mining Functionalities
13 pages
Lecture 02 - Cluster Analysis 1
No ratings yet
Lecture 02 - Cluster Analysis 1
59 pages
ML-UNIT-III
No ratings yet
ML-UNIT-III
12 pages
Unit 3 Clustering
No ratings yet
Unit 3 Clustering
101 pages
Clustering: EE-671 Prof L. Behera, IITK
No ratings yet
Clustering: EE-671 Prof L. Behera, IITK
33 pages
Stat401 ch6
No ratings yet
Stat401 ch6
37 pages
Lecture 3
No ratings yet
Lecture 3
46 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
8 pages
R PPT 30
No ratings yet
R PPT 30
45 pages
Slide TIF311 DM 10 11
No ratings yet
Slide TIF311 DM 10 11
49 pages
Lecture-9 Cluster Analysis_LAK
No ratings yet
Lecture-9 Cluster Analysis_LAK
4 pages
lecture07_95791
No ratings yet
lecture07_95791
84 pages
Hierarchical
No ratings yet
Hierarchical
31 pages
Lp2-Etl Model Assignment No. 2: R (2) C (4) V (2) T (2) Total (10) Dated Sign
No ratings yet
Lp2-Etl Model Assignment No. 2: R (2) C (4) V (2) T (2) Total (10) Dated Sign
7 pages
4 3 Topic Notes New
No ratings yet
4 3 Topic Notes New
9 pages
Lecture+Notes+ +clustering
No ratings yet
Lecture+Notes+ +clustering
13 pages
Hierarchical Clustering (1)
No ratings yet
Hierarchical Clustering (1)
11 pages
DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering
No ratings yet
DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering
61 pages
Cluster Analysis BRM Session 14
No ratings yet
Cluster Analysis BRM Session 14
25 pages
Intermediate R - Cluster Analysis
33% (3)
Intermediate R - Cluster Analysis
27 pages
hierarchicalclustering
No ratings yet
hierarchicalclustering
20 pages
Unit-6 Clustering Techniques
No ratings yet
Unit-6 Clustering Techniques
110 pages
Module 4 ML
No ratings yet
Module 4 ML
11 pages
TwoStep Cluster Analysis
No ratings yet
TwoStep Cluster Analysis
35 pages
Phân Cấp Phân Cụm
No ratings yet
Phân Cấp Phân Cụm
17 pages
7 HierarchicalClustering AND DBSCAN
No ratings yet
7 HierarchicalClustering AND DBSCAN
41 pages
Agglomerative Clustering
No ratings yet
Agglomerative Clustering
44 pages
CLUSTERING
No ratings yet
CLUSTERING
16 pages
1730098650_ML12_Clustering
No ratings yet
1730098650_ML12_Clustering
34 pages
ML UNIT-5 (1)
No ratings yet
ML UNIT-5 (1)
30 pages
A Famous Example of Cluster Analysis
No ratings yet
A Famous Example of Cluster Analysis
5 pages
Clustering
No ratings yet
Clustering
75 pages
MachineLearning Unit IV.pptx
No ratings yet
MachineLearning Unit IV.pptx
51 pages
BA2 7 Cluster
No ratings yet
BA2 7 Cluster
33 pages
L18_19_Clustering
No ratings yet
L18_19_Clustering
48 pages
CSE 319 Pattern Recognition: Clustering
No ratings yet
CSE 319 Pattern Recognition: Clustering
58 pages
Cluster Analysis Concept & Methods
No ratings yet
Cluster Analysis Concept & Methods
14 pages
Chapter 8 - Clustering
No ratings yet
Chapter 8 - Clustering
42 pages
Cluster Analysis Techniques
No ratings yet
Cluster Analysis Techniques
33 pages
Cluster Analysis: Motivation: Why Cluster Analysis Dissimilarity Matrices Introduction To Clustering Algorithms
No ratings yet
Cluster Analysis: Motivation: Why Cluster Analysis Dissimilarity Matrices Introduction To Clustering Algorithms
34 pages
MDA Session 4
No ratings yet
MDA Session 4
5 pages
Clustering: Sridhar S Department of IST Anna University
No ratings yet
Clustering: Sridhar S Department of IST Anna University
91 pages
Unit - 4 - Modified
No ratings yet
Unit - 4 - Modified
152 pages
ML Clustering Algorithm
No ratings yet
ML Clustering Algorithm
29 pages
Clustering: Georg Gerber Lecture #6, 2/6/02
No ratings yet
Clustering: Georg Gerber Lecture #6, 2/6/02
50 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
syllabus
No ratings yet
syllabus
2 pages
ppt4
No ratings yet
ppt4
54 pages
ppt1
No ratings yet
ppt1
37 pages
ppt5
No ratings yet
ppt5
29 pages
Fourth Brand Management PPT
No ratings yet
Fourth Brand Management PPT
17 pages
DAO 1996-24 (Socialized Industrial Forest Management Program)
No ratings yet
DAO 1996-24 (Socialized Industrial Forest Management Program)
19 pages
KickMaker Manual
No ratings yet
KickMaker Manual
7 pages
The Tides That Bind
No ratings yet
The Tides That Bind
7 pages
PDF Breccia Poster
No ratings yet
PDF Breccia Poster
1 page
Lesson 3 - Practical Research
No ratings yet
Lesson 3 - Practical Research
17 pages
EGEE 304 Syllabus SP2017
No ratings yet
EGEE 304 Syllabus SP2017
4 pages
Statistics cheat sheet 2
No ratings yet
Statistics cheat sheet 2
1 page
Module 7 Organizing
No ratings yet
Module 7 Organizing
27 pages
control kelvin nsitu
No ratings yet
control kelvin nsitu
7 pages
GECO-04 Principles of Marketing
No ratings yet
GECO-04 Principles of Marketing
166 pages
Essentials of Foye's Principles of Medicinal Chemistry Official eBook Release
100% (18)
Essentials of Foye's Principles of Medicinal Chemistry Official eBook Release
14 pages
Feasibility Study of Low Damage Technology For High-Rise Precast Concrete Buildings (2019)
No ratings yet
Feasibility Study of Low Damage Technology For High-Rise Precast Concrete Buildings (2019)
10 pages
Student 3 Stage 2 Update
No ratings yet
Student 3 Stage 2 Update
10 pages
6.LV MCC BUS BAR IR Test Report
No ratings yet
6.LV MCC BUS BAR IR Test Report
3 pages
CHY1004 Lab Assignment Questions: (Experiment 1)
No ratings yet
CHY1004 Lab Assignment Questions: (Experiment 1)
2 pages
Q4 - Lesson 3 - Planning of Qualitative Data
No ratings yet
Q4 - Lesson 3 - Planning of Qualitative Data
13 pages
XIISTENG181
No ratings yet
XIISTENG181
12 pages
Psychology-BA1-Unit-1-Branches-of-Psychology
No ratings yet
Psychology-BA1-Unit-1-Branches-of-Psychology
14 pages
16 PF - Primary Factor Interpretation
No ratings yet
16 PF - Primary Factor Interpretation
14 pages
Business Risk Calculation
No ratings yet
Business Risk Calculation
5 pages
165E1845-00_FA2100FDR_IOM
100% (3)
165E1845-00_FA2100FDR_IOM
116 pages
Sense-of-touch
No ratings yet
Sense-of-touch
16 pages
Psu Trubleshooting Tech
No ratings yet
Psu Trubleshooting Tech
9 pages
Submitted To Submitted By: Mr. Gopal K. Johari Mandeep Kaur Simrandeep Kaur M.Planning (Urban)
100% (1)
Submitted To Submitted By: Mr. Gopal K. Johari Mandeep Kaur Simrandeep Kaur M.Planning (Urban)
21 pages
Icnoc-2022 Poster Presentation Template
No ratings yet
Icnoc-2022 Poster Presentation Template
1 page
5.4.2 SC J1 Paper 2 SAQ All Question Types
No ratings yet
5.4.2 SC J1 Paper 2 SAQ All Question Types
14 pages
Chapter_1
No ratings yet
Chapter_1
78 pages
English Monthly Split ? Syllabus
No ratings yet
English Monthly Split ? Syllabus
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

ppt7

Uploaded by

ppt7

Uploaded by

Cluster Analysis

Hierarchical & k-means

• Scatterplot of these points is

• Step 5: Repeat steps 3 and 4 until convergence or termination. The centroids

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.