0% found this document useful (0 votes)
173 views

Data Mining Clustering Techniques

This document discusses various data mining clustering techniques. It begins with an introduction to data mining and its techniques such as association, classification, clustering, and prediction. It then focuses on clustering, describing it as a technique that partitions data into groups such that data within each group are more similar to each other than between groups. The document outlines different clustering methods including partitioning, hierarchical, grid-based, model-based, density-based, and constraint-based methods. It provides a brief description of each method and their advantages.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
173 views

Data Mining Clustering Techniques

This document discusses various data mining clustering techniques. It begins with an introduction to data mining and its techniques such as association, classification, clustering, and prediction. It then focuses on clustering, describing it as a technique that partitions data into groups such that data within each group are more similar to each other than between groups. The document outlines different clustering methods including partitioning, hierarchical, grid-based, model-based, density-based, and constraint-based methods. It provides a brief description of each method and their advantages.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Engineering and Scientific International Journal (ESIJ) ISSN 2394-7187(Online)

Volume 3, Issue 2, April – June 2016 ISSN 2394 - 7179 (Print)

Data Mining Clustering Techniques


R.RoopRekha#1, S.Perumal*2
1
Research Scholar, Dept of Computer Applications, Vels University, Chennai
roopselvam@gmail.com
2
Head, Dept of Computer Science, Vels University, Chennai

Abstract— Data mining is a powerful technology to extract • Sequence pattern


information from the large amount of the data. Data mining • Regression
is considered as one of the important field in knowledge
management. Today, Data mining helps different A. Clustering
organization focus on the data they collected based on the
attitude of their customer‘s. For the past few years, Clustering is a technique used in data mining that
research in data mining continues in various fields of enables us to discover groups and hence identify interesting
organization and research such as Statistics, Artificial distributions and patterns in the underlying data. Clustering
Intelligence, Pattern Recognition, Machine Learning, partitions a given data set into clusters (groups) such that
Business, Education, Scientific etc. This paper discuss the the data in a cluster are more similar to each other than
various concepts of data mining and its techniques. data in other clusters[1].
Cluster: A cluster is a set of data objects similar to one
Keywords— Data Mining; Data Base; Cluster; Prediction. another and dissimilar to the objects in other groups.
Cluster Analysis: The main aim is to identify clusters of
1. Introduction similar objects and to discover interesting patterns and
correlations in huge data sets. It groups a set of data objects
1.1 Data Mining into clusters.
The similarities are identified between data depends on
Data mining is a technique of take out or mining facts the features found in the data and groups similar data
from numerous amounts of dataset. Data mining is also objects into clusters. Clustering divides a data into groups
referred as data or knowledge discovery. It analyze data of similar objects. Clustering is a technique of
from different perspectives and summarizes it into useful unsupervised learning. Clustering group‘s data that share
information the associations or relationships among all similar patterns. Clustering of data is a method by which
these data. Data mining tool is used for analyzing data. large set of data are clustered into groups of small set of
Mining allows users to analyze data from different similar data.
dimensions or angle.It categorize data and summarize the Cluster Analysis or Clustering involves grouping
relationships identified. Data collection and storage similar objects in the same group (called a cluster). Each
technology made it possible for organizations to store huge group called cluster are more similar between themselves
amounts of data at lower cost. Exploit this data to extract and dissimilar to objects of other groups (clusters).The
useful and actionable information. Data mining is the clustering technique groups data or divides a large data set
process of exploring and analyzing large amount of data to into smaller data sets of some similarity. The process of
discover meaningful patterns and rules. In reality, data mining requires various methods such as Image
performing data mining undergoing an entire process is Analysis, Pattern Recognition, Information Retrieval and
essentially iterative and semi-automated and may require Bioinformatics Etc.,
human interference in several key points.The two main
reasons to use data mining are as follows 1.3 Methods on clustering
i) Too much data and too little information.
ii) It is essential to extract useful information from the data Clustering assigns records of similar objects into groups
and to interpret the data. (called clusters) so that data objects of the same cluster are
similar to one another than objects of different groups.
1.2 Data Mining Techniques Clustering methods have been argued extensively in Trend
Analysis, similarity search, Segmentation, Pattern
The key techniques of data mining are Recognition and classification. The clustering methods are
• Association classified into following methods
• Classification • Partioniong Method
• Clustering • Grid - based Method
• Prediction • Hieriarchical Method

16
Engineering and Scientific International Journal (ESIJ) ISSN 2394-7187(Online)
Volume 3, Issue 2, April – June 2016 ISSN 2394 - 7179 (Print)

• Model-based Method cells that create a grid formation. The major advantage of
• Density-based Method this method is fast processing time. Another advantage is
• Constraint-based Method dependent only on the cells in each dimension in the space.
Clusters include groups with small distances within the
cluster members and more dark areas of the data space, E. Model-Based Method
intervals or particular statistical distributions[2]. Clustering
methods for uncertain data mainly divided into two A model is hypothesizing for each cluster and finds the
categories such as partitioning and Hierarchical best fit of data to the given data model. It identifies the
approaches. Analysis similarity is the most important clusters by applying the density function. This shows
method using the clustering is partition and Hierarchical. spatial distribution of the data points. This method serves
as a way of automatically determe the number of clusters
A. Partitioning Method based on typical statistics considering outliers or noise into
account.
For 'n' data objects, the partitioning method develops k
partition of data.. Each partition will represent a cluster k≤n. F. Constraint-Based Method
It classifies the data into k groups, which satisfies the
following requirements: It identifies the user expectation or the properties of
• At least one object in each group. clustering results. The constraint gives us the interactive
• A object must belong to exactly one group not more than way of communication with the clustering process. The
a group. constraints are specified by the user or the application
For a given number of 'k' partitions, the partitioning requirement.
method creates an step partitioning. Then it uses the
iterative relocation technique to improve the partitioning by 2. Hierarchical clustering
moving data objects of one group into other.
The main drawback of partitioning the objects into k
The Cluster analysis goal is that the objects within a
clusters repeatedly reallocates objects to improve the
group must be similar to each other and dissimilar from the
clustering. It uses an k-medoid method for each sub-set of a
objects of the other groups. The greater similarity (or
data stream. In order to iterative evaluation of the k-medoid
homogeneity) of clustering within the group and greater
algorithm[4], its objective is to maintain only the consistent
difference between the groups and better or more distinct
good data elements ,i.e., each of which represents the
among the clustering. The hierarchical clustering is a
cluster for the data elements.
method of cluster analysis which builds clusters in
hierarchical fashions. The strategy for hierarchical
B. Hierarchical Method
clustering are of two types:
• Agglomerative: It is a "bottom-up" approach. Each
This method creates the hierarchical segregation of the
iteration starts with one cluster and pairs of clusters are
given set of data objects. Thus, the decomposition of
merged to get new clusters.
hierarchical algorithm is formed as follows:
• Divisive: It is a "top- down" concept. In each time, the
Agglomerative: It is a 'bottom-up' approach. Each time a
iterations begins with a cluster ‗A‘ and splits are performed
cluster or collection is erged with other group to shape
continuously as one moves down the hierarchy.
larger ones.
Fundamentally, the merges and splits are identified in a
Divisive: It is a 'top-down' approach. All data objects are
greedy fashion. The output of hierarchical clustering are
placed in single cluster and split it up into smaller clusters.
generally displayed by using a dendrogram. The
disadvantage of agglomerative clustering is it makes them
C. Density-Based Method
too slow for large data set points.
The Density-based method is based on the notion of
density. It allows the group to grow as long as the density 3. Advantages of Hierarchical Clustering
in its neighborhood goes beyond some threshold level i.e.
for each data point in a given cluster the radius of a given The advantages of the hierarchical clustering algorithms
cluster must contain at least a minimum number of data are,
points. • Embedded flexibility in level of granularity.
• Easy handling of any forms of similarity or distance.
D. Grid-Based Method • It is applicable to any attributes types.
These advantages of hierarchical clustering leads to the
In this the objects together form a multi-resolution grid cost of lower efficiency. Agglomerative hierarchical
structure. The object space is divided into fixed number of clustering presents four different algorithms,

17
Engineering and Scientific International Journal (ESIJ) ISSN 2394-7187(Online)
Volume 3, Issue 2, April – June 2016 ISSN 2394 - 7179 (Print)

• Similarity measures of a single-relink process of chaining References


effect,
• Complete-link process of not sensitive to outliers, [1] I.K. Ravichandra Rao (2003), ―Data Mining and Clustering
• Group-average process of Best choice for most Techniques‖, DRTC Workshop on Semantic Web, Bangalore.
[2] Jiawei Han & Micheline Kamber (2006), ―Data Mining: Concepts and
applications, Techniques‖, The Morgan Kaufmann / Elsevier India.
• Centroid process of inversions can be occurred. [3] ―Clustering Uncertain Data With Possible Worlds‖Peter Benjamin
Volk, Frank Rosenthal, Martin Hahmann, Dirk Habich, Wolfgang
4. Conclusion Lehner, IEEE International Conference on Data Engineering.
[4] J.A.S. Almeida, L.M.S. Barbosa, A.A.C.C. Pais & S.J. Formosinho
(2007), ―Improving Hierarchical Cluster Analysis: A New Method
In this paper, various clustering algorithms and its with Outlier Detection and Automatic Clustering‖, Chemometrics and
features are discussed and analysed. Based on the results, Intelligent Laboratory Systems, Vol. 87, Pp. 208–217.
the hierarchical clustering techniques in data mining are [5] A.S.Aneeshkumar and Dr. C.Jothi Venkateswaran, ―A novel approach
for Liver disorder Classification using Data Mining Techniques‖,
recognized as efficient and best for many applications in Engineering and Scientific International Journal, Volume 2, Issue 1,
various industries. January - March 2015, pp.15-18.

18

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy