Data Mining Clustering Techniques
Data Mining Clustering Techniques
16
Engineering and Scientific International Journal (ESIJ) ISSN 2394-7187(Online)
Volume 3, Issue 2, April – June 2016 ISSN 2394 - 7179 (Print)
• Model-based Method cells that create a grid formation. The major advantage of
• Density-based Method this method is fast processing time. Another advantage is
• Constraint-based Method dependent only on the cells in each dimension in the space.
Clusters include groups with small distances within the
cluster members and more dark areas of the data space, E. Model-Based Method
intervals or particular statistical distributions[2]. Clustering
methods for uncertain data mainly divided into two A model is hypothesizing for each cluster and finds the
categories such as partitioning and Hierarchical best fit of data to the given data model. It identifies the
approaches. Analysis similarity is the most important clusters by applying the density function. This shows
method using the clustering is partition and Hierarchical. spatial distribution of the data points. This method serves
as a way of automatically determe the number of clusters
A. Partitioning Method based on typical statistics considering outliers or noise into
account.
For 'n' data objects, the partitioning method develops k
partition of data.. Each partition will represent a cluster k≤n. F. Constraint-Based Method
It classifies the data into k groups, which satisfies the
following requirements: It identifies the user expectation or the properties of
• At least one object in each group. clustering results. The constraint gives us the interactive
• A object must belong to exactly one group not more than way of communication with the clustering process. The
a group. constraints are specified by the user or the application
For a given number of 'k' partitions, the partitioning requirement.
method creates an step partitioning. Then it uses the
iterative relocation technique to improve the partitioning by 2. Hierarchical clustering
moving data objects of one group into other.
The main drawback of partitioning the objects into k
The Cluster analysis goal is that the objects within a
clusters repeatedly reallocates objects to improve the
group must be similar to each other and dissimilar from the
clustering. It uses an k-medoid method for each sub-set of a
objects of the other groups. The greater similarity (or
data stream. In order to iterative evaluation of the k-medoid
homogeneity) of clustering within the group and greater
algorithm[4], its objective is to maintain only the consistent
difference between the groups and better or more distinct
good data elements ,i.e., each of which represents the
among the clustering. The hierarchical clustering is a
cluster for the data elements.
method of cluster analysis which builds clusters in
hierarchical fashions. The strategy for hierarchical
B. Hierarchical Method
clustering are of two types:
• Agglomerative: It is a "bottom-up" approach. Each
This method creates the hierarchical segregation of the
iteration starts with one cluster and pairs of clusters are
given set of data objects. Thus, the decomposition of
merged to get new clusters.
hierarchical algorithm is formed as follows:
• Divisive: It is a "top- down" concept. In each time, the
Agglomerative: It is a 'bottom-up' approach. Each time a
iterations begins with a cluster ‗A‘ and splits are performed
cluster or collection is erged with other group to shape
continuously as one moves down the hierarchy.
larger ones.
Fundamentally, the merges and splits are identified in a
Divisive: It is a 'top-down' approach. All data objects are
greedy fashion. The output of hierarchical clustering are
placed in single cluster and split it up into smaller clusters.
generally displayed by using a dendrogram. The
disadvantage of agglomerative clustering is it makes them
C. Density-Based Method
too slow for large data set points.
The Density-based method is based on the notion of
density. It allows the group to grow as long as the density 3. Advantages of Hierarchical Clustering
in its neighborhood goes beyond some threshold level i.e.
for each data point in a given cluster the radius of a given The advantages of the hierarchical clustering algorithms
cluster must contain at least a minimum number of data are,
points. • Embedded flexibility in level of granularity.
• Easy handling of any forms of similarity or distance.
D. Grid-Based Method • It is applicable to any attributes types.
These advantages of hierarchical clustering leads to the
In this the objects together form a multi-resolution grid cost of lower efficiency. Agglomerative hierarchical
structure. The object space is divided into fixed number of clustering presents four different algorithms,
17
Engineering and Scientific International Journal (ESIJ) ISSN 2394-7187(Online)
Volume 3, Issue 2, April – June 2016 ISSN 2394 - 7179 (Print)
18