Part Eight PDF
Part Eight PDF
Part Eight PDF
Unsupervised Learning
Clustering ٍ
PART 6
◼ Simple segmentation
❑ Dividing students into different registration groups
alphabetically, by last name
◼ Results of a query
❑ Groupings are a result of an external specification
◼ Graph partitioning
❑ Some mutual relevance and synergy, but areas are not
identical
Aspects of clustering
◼ A clustering algorithm
❑ Partitional clustering
❑ Hierarchical clustering
❑ …
◼ A distance (similarity, or dissimilarity) function
◼ Clustering quality
❑ Inter-clusters distance maximized
❑ Intra-clusters distance minimized
◼ The quality of a clustering result depends on
the algorithm, the distance function, and the
application.
Dr. Ahmed Sultan Al-Hegami 9
Road map
◼ Basic concepts
◼ Types of Clusterings
◼ Types of Clusters
◼ Clustering Algorithms
◼ K-means algorithm
◼ Representation of clusters
◼ Hierarchical clustering
◼ Distance functions
◼ Data standardization
◼ Handling mixed attributes
◼ Which clustering algorithm to use?
◼ Cluster evaluation
◼ Discovering holes and data regions
◼ Summary
◼ Hierarchical clustering
❑ A set of nested clusters organized as a hierarchical tree
Partitional Clustering
p1
p3 p4
p2
p1 p2 p3 p4
◼Traditional Hierarchical Clustering ◼Traditional Dendrogram (tree)
p1
p3 p4
p2
p1 p2 p3 p4
◼ Center-based clusters
◼ Contiguous clusters
◼ Density-based clusters
◼ Property or Conceptual
◼ Well-Separated Clusters:
❑ A cluster is a set of points such that any point in a cluster is
closer (or more similar) to every other point in the cluster than
to any point not in the cluster.
◼3 well-separated clusters
Types of Clusters: Center-Based
◼ Center-based
❑ A cluster is a set of objects such that an object in a cluster is
closer (more similar) to the “center” of a cluster, than to the
center of any other cluster
❑ The center of a cluster is often a centroid, the average of all
the points in the cluster, or a medoid, the most “representative”
point of a cluster
4 center-based clusters
Types of Clusters: Contiguity-Based
◼8 contiguous clusters
Types of Clusters: Density-Based
◼ Density-based
❑ A cluster is a dense region of points, which is separated by
low-density regions, from other regions of high density.
❑ Used when the clusters are irregular or intertwined, and when
noise and outliers are present.
6 density-based clusters
Types of Clusters: Conceptual Clusters
2 Overlapping Circles
Types of Clusters: Objective Function
◼ Hierarchical clustering
◼ Density-based clustering
K-means Clustering
◼ Partitional clustering approach
◼ Each cluster is associated with a centroid (center point)
◼ Each point is assigned to the cluster with the closest
centroid
◼ Number of clusters, K, must be specified
◼ The basic algorithm is very simple
K-means Clustering – Details
◼ Initial centroids are often chosen randomly.
❑ Clusters produced vary from one run to another.
◼ The centroid is (typically) the mean of the points in the
cluster.
◼ ‘Closeness’ is measured by Euclidean distance, cosine
similarity, correlation, etc.
◼ K-means will converge for common similarity measures
mentioned above.
◼ Most of the convergence happens in the first few
iterations.
❑ Often the stopping condition is changed to ‘Until relatively few
points change clusters’
◼ Complexity is O( n * K * I * d )
❑ n = number of points, K = number of clusters,
I = number of iterations, d = number of attributes
Hierarchical Clustering
merges or splits
6 5
0.2
4
3 4
0.15 2
5
2
0.1
1
0.05
3 1
0
1 3 2 5 4 6
Strengths of Hierarchical Clustering
❑ Divisive:
◼ Start with one, all-inclusive cluster
◼ At each step, split a cluster until each cluster contains a point (or
there are k clusters)
+
+