0% found this document useful (0 votes)

173 views

Data Mining Clustering Techniques

This document discusses various data mining clustering techniques. It begins with an introduction to data mining and its techniques such as association, classification, clustering, and prediction. It then focuses on clustering, describing it as a technique that partitions data into groups such that data within each group are more similar to each other than between groups. The document outlines different clustering methods including partitioning, hierarchical, grid-based, model-based, density-based, and constraint-based methods. It provides a brief description of each method and their advantages.

Uploaded by

Engineering and Scientific International Journal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

173 views

Data Mining Clustering Techniques

Uploaded by

Engineering and Scientific International Journal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Engineering and Scientific International Journal (ESIJ) ISSN 2394-7187(Online)

Volume 3, Issue 2, April – June 2016 ISSN 2394 - 7179 (Print)

Data Mining Clustering Techniques

R.RoopRekha#1, S.Perumal*2
1
Research Scholar, Dept of Computer Applications, Vels University, Chennai
roopselvam@gmail.com
2
Head, Dept of Computer Science, Vels University, Chennai

Abstract— Data mining is a powerful technology to extract • Sequence pattern

information from the large amount of the data. Data mining • Regression
is considered as one of the important field in knowledge
management. Today, Data mining helps different A. Clustering
organization focus on the data they collected based on the
attitude of their customer‘s. For the past few years, Clustering is a technique used in data mining that
research in data mining continues in various fields of enables us to discover groups and hence identify interesting
organization and research such as Statistics, Artificial distributions and patterns in the underlying data. Clustering
Intelligence, Pattern Recognition, Machine Learning, partitions a given data set into clusters (groups) such that
Business, Education, Scientific etc. This paper discuss the the data in a cluster are more similar to each other than
various concepts of data mining and its techniques. data in other clusters[1].
Cluster: A cluster is a set of data objects similar to one
Keywords— Data Mining; Data Base; Cluster; Prediction. another and dissimilar to the objects in other groups.
Cluster Analysis: The main aim is to identify clusters of
1. Introduction similar objects and to discover interesting patterns and
correlations in huge data sets. It groups a set of data objects
1.1 Data Mining into clusters.
The similarities are identified between data depends on
Data mining is a technique of take out or mining facts the features found in the data and groups similar data
from numerous amounts of dataset. Data mining is also objects into clusters. Clustering divides a data into groups
referred as data or knowledge discovery. It analyze data of similar objects. Clustering is a technique of
from different perspectives and summarizes it into useful unsupervised learning. Clustering group‘s data that share
information the associations or relationships among all similar patterns. Clustering of data is a method by which
these data. Data mining tool is used for analyzing data. large set of data are clustered into groups of small set of
Mining allows users to analyze data from different similar data.
dimensions or angle.It categorize data and summarize the Cluster Analysis or Clustering involves grouping
relationships identified. Data collection and storage similar objects in the same group (called a cluster). Each
technology made it possible for organizations to store huge group called cluster are more similar between themselves
amounts of data at lower cost. Exploit this data to extract and dissimilar to objects of other groups (clusters).The
useful and actionable information. Data mining is the clustering technique groups data or divides a large data set
process of exploring and analyzing large amount of data to into smaller data sets of some similarity. The process of
discover meaningful patterns and rules. In reality, data mining requires various methods such as Image
performing data mining undergoing an entire process is Analysis, Pattern Recognition, Information Retrieval and
essentially iterative and semi-automated and may require Bioinformatics Etc.,
human interference in several key points.The two main
reasons to use data mining are as follows 1.3 Methods on clustering
i) Too much data and too little information.
ii) It is essential to extract useful information from the data Clustering assigns records of similar objects into groups
and to interpret the data. (called clusters) so that data objects of the same cluster are
similar to one another than objects of different groups.
1.2 Data Mining Techniques Clustering methods have been argued extensively in Trend
Analysis, similarity search, Segmentation, Pattern
The key techniques of data mining are Recognition and classification. The clustering methods are
• Association classified into following methods
• Classification • Partioniong Method
• Clustering • Grid - based Method
• Prediction • Hieriarchical Method

16
Engineering and Scientific International Journal (ESIJ) ISSN 2394-7187(Online)
Volume 3, Issue 2, April – June 2016 ISSN 2394 - 7179 (Print)

• Model-based Method cells that create a grid formation. The major advantage of
• Density-based Method this method is fast processing time. Another advantage is
• Constraint-based Method dependent only on the cells in each dimension in the space.
Clusters include groups with small distances within the
cluster members and more dark areas of the data space, E. Model-Based Method
intervals or particular statistical distributions[2]. Clustering
methods for uncertain data mainly divided into two A model is hypothesizing for each cluster and finds the
categories such as partitioning and Hierarchical best fit of data to the given data model. It identifies the
approaches. Analysis similarity is the most important clusters by applying the density function. This shows
method using the clustering is partition and Hierarchical. spatial distribution of the data points. This method serves
as a way of automatically determe the number of clusters
A. Partitioning Method based on typical statistics considering outliers or noise into
account.
For 'n' data objects, the partitioning method develops k
partition of data.. Each partition will represent a cluster k≤n. F. Constraint-Based Method
It classifies the data into k groups, which satisfies the
following requirements: It identifies the user expectation or the properties of
• At least one object in each group. clustering results. The constraint gives us the interactive
• A object must belong to exactly one group not more than way of communication with the clustering process. The
a group. constraints are specified by the user or the application
For a given number of 'k' partitions, the partitioning requirement.
method creates an step partitioning. Then it uses the
iterative relocation technique to improve the partitioning by 2. Hierarchical clustering
moving data objects of one group into other.
The main drawback of partitioning the objects into k
The Cluster analysis goal is that the objects within a
clusters repeatedly reallocates objects to improve the
group must be similar to each other and dissimilar from the
clustering. It uses an k-medoid method for each sub-set of a
objects of the other groups. The greater similarity (or
data stream. In order to iterative evaluation of the k-medoid
homogeneity) of clustering within the group and greater
algorithm[4], its objective is to maintain only the consistent
difference between the groups and better or more distinct
good data elements ,i.e., each of which represents the
among the clustering. The hierarchical clustering is a
cluster for the data elements.
method of cluster analysis which builds clusters in
hierarchical fashions. The strategy for hierarchical
B. Hierarchical Method
clustering are of two types:
• Agglomerative: It is a "bottom-up" approach. Each
This method creates the hierarchical segregation of the
iteration starts with one cluster and pairs of clusters are
given set of data objects. Thus, the decomposition of
merged to get new clusters.
hierarchical algorithm is formed as follows:
• Divisive: It is a "top- down" concept. In each time, the
Agglomerative: It is a 'bottom-up' approach. Each time a
iterations begins with a cluster ‗A‘ and splits are performed
cluster or collection is erged with other group to shape
continuously as one moves down the hierarchy.
larger ones.
Fundamentally, the merges and splits are identified in a
Divisive: It is a 'top-down' approach. All data objects are
greedy fashion. The output of hierarchical clustering are
placed in single cluster and split it up into smaller clusters.
generally displayed by using a dendrogram. The
disadvantage of agglomerative clustering is it makes them
C. Density-Based Method
too slow for large data set points.
The Density-based method is based on the notion of
density. It allows the group to grow as long as the density 3. Advantages of Hierarchical Clustering
in its neighborhood goes beyond some threshold level i.e.
for each data point in a given cluster the radius of a given The advantages of the hierarchical clustering algorithms
cluster must contain at least a minimum number of data are,
points. • Embedded flexibility in level of granularity.
• Easy handling of any forms of similarity or distance.
D. Grid-Based Method • It is applicable to any attributes types.
These advantages of hierarchical clustering leads to the
In this the objects together form a multi-resolution grid cost of lower efficiency. Agglomerative hierarchical
structure. The object space is divided into fixed number of clustering presents four different algorithms,

17
Engineering and Scientific International Journal (ESIJ) ISSN 2394-7187(Online)
Volume 3, Issue 2, April – June 2016 ISSN 2394 - 7179 (Print)

• Similarity measures of a single-relink process of chaining References

effect,
• Complete-link process of not sensitive to outliers, [1] I.K. Ravichandra Rao (2003), ―Data Mining and Clustering
• Group-average process of Best choice for most Techniques‖, DRTC Workshop on Semantic Web, Bangalore.
[2] Jiawei Han & Micheline Kamber (2006), ―Data Mining: Concepts and
applications, Techniques‖, The Morgan Kaufmann / Elsevier India.
• Centroid process of inversions can be occurred. [3] ―Clustering Uncertain Data With Possible Worlds‖Peter Benjamin
Volk, Frank Rosenthal, Martin Hahmann, Dirk Habich, Wolfgang
4. Conclusion Lehner, IEEE International Conference on Data Engineering.
[4] J.A.S. Almeida, L.M.S. Barbosa, A.A.C.C. Pais & S.J. Formosinho
(2007), ―Improving Hierarchical Cluster Analysis: A New Method
In this paper, various clustering algorithms and its with Outlier Detection and Automatic Clustering‖, Chemometrics and
features are discussed and analysed. Based on the results, Intelligent Laboratory Systems, Vol. 87, Pp. 208–217.
the hierarchical clustering techniques in data mining are [5] A.S.Aneeshkumar and Dr. C.Jothi Venkateswaran, ―A novel approach
for Liver disorder Classification using Data Mining Techniques‖,
recognized as efficient and best for many applications in Engineering and Scientific International Journal, Volume 2, Issue 1,
various industries. January - March 2015, pp.15-18.

Shinmai Maou No Testament - Volume 09
No ratings yet
Shinmai Maou No Testament - Volume 09
351 pages
6 7 Non Verbal Reasoning Bond Starter Paper
100% (1)
6 7 Non Verbal Reasoning Bond Starter Paper
8 pages
ISO 15243 Rolling Bearings Damage and Failures Terms Characteristics Andcauses
100% (2)
ISO 15243 Rolling Bearings Damage and Failures Terms Characteristics Andcauses
60 pages
Module - 4 K Means Clustering
No ratings yet
Module - 4 K Means Clustering
20 pages
Engineering Formulas by Kurt Gieck Reiner Gieck
100% (1)
Engineering Formulas by Kurt Gieck Reiner Gieck
8 pages
This Study Resource Was: Session 7: Veritas
100% (1)
This Study Resource Was: Session 7: Veritas
7 pages
Rancang Bangun Aplikasi Data Mining Pada Penjualan Distro Bloods Berbasis Web Menggunakan Algoritma Apriori
No ratings yet
Rancang Bangun Aplikasi Data Mining Pada Penjualan Distro Bloods Berbasis Web Menggunakan Algoritma Apriori
8 pages
Market Basket Analysis For Data Mining - Msthesis PDF
No ratings yet
Market Basket Analysis For Data Mining - Msthesis PDF
75 pages
Data Mining Dan Bigdata
No ratings yet
Data Mining Dan Bigdata
38 pages
Data Mining CaseBrasilTelecom
No ratings yet
Data Mining CaseBrasilTelecom
15 pages
Data Mining
No ratings yet
Data Mining
27 pages
Modul Machine Learning
No ratings yet
Modul Machine Learning
20 pages
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
No ratings yet
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
16 pages
Data Mining Techniques For Weather Prediction A Review
No ratings yet
Data Mining Techniques For Weather Prediction A Review
6 pages
Data Mining Process
No ratings yet
Data Mining Process
12 pages
Model Overfitting Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Model Overfitting Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
30 pages
Classification Vs Regression
No ratings yet
Classification Vs Regression
3 pages
RMM Unit-I Introdution To Data Mining
No ratings yet
RMM Unit-I Introdution To Data Mining
129 pages
1 Pengantar Data Mining 1
No ratings yet
1 Pengantar Data Mining 1
68 pages
Data Mining Hotel
No ratings yet
Data Mining Hotel
17 pages
Naive Bayes Spam Classifier
0% (1)
Naive Bayes Spam Classifier
44 pages
Data Mining Analysis To Determine Employee Salaries According To Needs Based On The K-Medoids Clustering Algorithm
No ratings yet
Data Mining Analysis To Determine Employee Salaries According To Needs Based On The K-Medoids Clustering Algorithm
8 pages
Cmse
No ratings yet
Cmse
12 pages
Literature Review On Feature Selection Methods For HighDimensional Data
No ratings yet
Literature Review On Feature Selection Methods For HighDimensional Data
9 pages
Lecture 3 Data Mining
No ratings yet
Lecture 3 Data Mining
30 pages
Rapid Miner - Data Preparation
100% (1)
Rapid Miner - Data Preparation
17 pages
Data Mining Unit 4 (1) PDF PDF
No ratings yet
Data Mining Unit 4 (1) PDF PDF
11 pages
Market Basket Analysis and Advanced Data Mining: Professor Amit Basu
No ratings yet
Market Basket Analysis and Advanced Data Mining: Professor Amit Basu
24 pages
Data Preprocessing For Python
No ratings yet
Data Preprocessing For Python
3 pages
Marko Grobelnik, Blaz Fortuna, Dunja Mladenic Jozef Stefan Institute, Slovenia
100% (1)
Marko Grobelnik, Blaz Fortuna, Dunja Mladenic Jozef Stefan Institute, Slovenia
107 pages
Data Mining Questions
100% (1)
Data Mining Questions
7 pages
Titanic Prediction
No ratings yet
Titanic Prediction
53 pages
Performance Comparison Between Naïve Bayes, Decision Tree and K-Nearest Neighbor in Searching Alternative Design in An Energy Simulation Tool
No ratings yet
Performance Comparison Between Naïve Bayes, Decision Tree and K-Nearest Neighbor in Searching Alternative Design in An Energy Simulation Tool
7 pages
(Ebook) RapidMiner: Data Mining Use Cases and Business Analytics Applications by Markus Hofmann, Ralf Klinkenberg ISBN 9781482205503, 1482205505 2024 scribd download
100% (1)
(Ebook) RapidMiner: Data Mining Use Cases and Business Analytics Applications by Markus Hofmann, Ralf Klinkenberg ISBN 9781482205503, 1482205505 2024 scribd download
81 pages
Data Mining
No ratings yet
Data Mining
32 pages
Data Integration & Transformation
No ratings yet
Data Integration & Transformation
14 pages
Association Rules
No ratings yet
Association Rules
64 pages
Data Mining Thesis
No ratings yet
Data Mining Thesis
104 pages
Installation and Configuration - SAS Enterprise Miner
No ratings yet
Installation and Configuration - SAS Enterprise Miner
36 pages
CH 6
No ratings yet
CH 6
72 pages
Logistic Regression
100% (1)
Logistic Regression
29 pages
Python For Multivariate Analysis
No ratings yet
Python For Multivariate Analysis
47 pages
Cluster Analysis: Concepts and Techniques - Chapter 7
100% (1)
Cluster Analysis: Concepts and Techniques - Chapter 7
60 pages
Data Mining
No ratings yet
Data Mining
49 pages
Association Rule Mining: - Algorithms For Frequent Itemset Mining - Apriori - Elcat - FP-Growth
No ratings yet
Association Rule Mining: - Algorithms For Frequent Itemset Mining - Apriori - Elcat - FP-Growth
45 pages
A Survey On Data Mining
No ratings yet
A Survey On Data Mining
4 pages
A Brief Introduction To Data Mining (DM) : Bs Cs - V Iii BY Sanianayab
No ratings yet
A Brief Introduction To Data Mining (DM) : Bs Cs - V Iii BY Sanianayab
23 pages
Data Preprocessing
No ratings yet
Data Preprocessing
57 pages
Text Mining Menggunakan Metode Naive Bayes - Source Code Program Tesis Skripsi Tugas Akhir
50% (2)
Text Mining Menggunakan Metode Naive Bayes - Source Code Program Tesis Skripsi Tugas Akhir
16 pages
Data Preprocessing Python 1
No ratings yet
Data Preprocessing Python 1
3 pages
(eBook PDF) Introduction to Data Mining 2nd Edition by Pang-Ning Tanpdf download
100% (8)
(eBook PDF) Introduction to Data Mining 2nd Edition by Pang-Ning Tanpdf download
51 pages
Python Machine Learning - Machine Learning and Deep Learning With Python Scikit Learn and Tensorflow 2 Third Edition
No ratings yet
Python Machine Learning - Machine Learning and Deep Learning With Python Scikit Learn and Tensorflow 2 Third Edition
4 pages
(IJETA-V8I5P1) :yew Kee Wong
No ratings yet
(IJETA-V8I5P1) :yew Kee Wong
5 pages
Data Mining in Medicine
No ratings yet
Data Mining in Medicine
42 pages
Data Mining Unit 5
No ratings yet
Data Mining Unit 5
30 pages
Mc9280 Data Mining and Data Warehousing
No ratings yet
Mc9280 Data Mining and Data Warehousing
1 page
DataMining Lecture 1
No ratings yet
DataMining Lecture 1
35 pages
Time Series
No ratings yet
Time Series
29 pages
Heart Diasease Prediction (KNN) by Dr. Elmanani Simamora, M.si
No ratings yet
Heart Diasease Prediction (KNN) by Dr. Elmanani Simamora, M.si
34 pages
FP Growth PPT Shabnam
No ratings yet
FP Growth PPT Shabnam
19 pages
Data Science Pipeline, EDA & Data Preparation
No ratings yet
Data Science Pipeline, EDA & Data Preparation
14 pages
Keyence Image Processing Useful Tips Vol.7 Pre Processing
No ratings yet
Keyence Image Processing Useful Tips Vol.7 Pre Processing
6 pages
Data Science - Unit II
100% (2)
Data Science - Unit II
173 pages
DM MODULE 4
No ratings yet
DM MODULE 4
17 pages
ML Unit 4 Notes - NJ
No ratings yet
ML Unit 4 Notes - NJ
15 pages
Artificial Intelligence in Human Resource Management
No ratings yet
Artificial Intelligence in Human Resource Management
5 pages
Permanent Magnet Moving Coil Instrument - Application of Rohit Transform
No ratings yet
Permanent Magnet Moving Coil Instrument - Application of Rohit Transform
3 pages
Iodine Status of School Age Children - Nigeria
No ratings yet
Iodine Status of School Age Children - Nigeria
7 pages
Steady Hydromagnetic Flow On A Continuously Moving Surface
No ratings yet
Steady Hydromagnetic Flow On A Continuously Moving Surface
7 pages
Quantum Dot Cellular Automato Process in Image Processing Technique
No ratings yet
Quantum Dot Cellular Automato Process in Image Processing Technique
4 pages
Construction of A Fiber Optic Sensor
No ratings yet
Construction of A Fiber Optic Sensor
5 pages
New Approach For The Assessment of Hydraulic Heave
No ratings yet
New Approach For The Assessment of Hydraulic Heave
6 pages
Levitation of Cells
No ratings yet
Levitation of Cells
3 pages
Fingerprint Enhancement Using Gabor Filter Algorithm
No ratings yet
Fingerprint Enhancement Using Gabor Filter Algorithm
4 pages
GT-Power Integrated With ModeFRONTIER
No ratings yet
GT-Power Integrated With ModeFRONTIER
5 pages
Role Oriented Challenges For User Experience Design in Agile Scrum
No ratings yet
Role Oriented Challenges For User Experience Design in Agile Scrum
3 pages
IoT Based Irrigation Monitoring System
No ratings yet
IoT Based Irrigation Monitoring System
5 pages
Nutrition and Immune Function
No ratings yet
Nutrition and Immune Function
4 pages
Secured Smart Card System Using PostGreSQL Database
No ratings yet
Secured Smart Card System Using PostGreSQL Database
3 pages
Improving Iris Performance Using Segmentation With CASIA Database
No ratings yet
Improving Iris Performance Using Segmentation With CASIA Database
3 pages
MATLAB Speed Control DC Motor
No ratings yet
MATLAB Speed Control DC Motor
4 pages
M1-01 CMOS Fabrication & Layout
No ratings yet
M1-01 CMOS Fabrication & Layout
26 pages
Instant Download (Ebook) The science of animal agriculture by Ray V Herren ISBN 9781435480742, 1435480740 PDF All Chapters
100% (3)
Instant Download (Ebook) The science of animal agriculture by Ray V Herren ISBN 9781435480742, 1435480740 PDF All Chapters
82 pages
Ielm 103
No ratings yet
Ielm 103
48 pages
Kant's Copernican Analogy A Re-Examination
No ratings yet
Kant's Copernican Analogy A Re-Examination
9 pages
Strategy Implementation in Organizations
No ratings yet
Strategy Implementation in Organizations
4 pages
Lesson 10 Maximum and Minimum of Function of Two Variables: Module 1: Differential Calculus
No ratings yet
Lesson 10 Maximum and Minimum of Function of Two Variables: Module 1: Differential Calculus
6 pages
Y6-Fiction 2024
No ratings yet
Y6-Fiction 2024
121 pages
Physics-Informed Neural Network Method For Solving One-Dimensional Advection Equation Using Pytorch
No ratings yet
Physics-Informed Neural Network Method For Solving One-Dimensional Advection Equation Using Pytorch
15 pages
Max Planck
No ratings yet
Max Planck
1 page
ME3393 Manufacturing Processes 03 - by LearnEngineering - in
No ratings yet
ME3393 Manufacturing Processes 03 - by LearnEngineering - in
37 pages
Listening Practice 24.3
No ratings yet
Listening Practice 24.3
3 pages
Teachers Guide - PDF Â Versã o 1
No ratings yet
Teachers Guide - PDF Â Versã o 1
25 pages
Idioms the Colorful Language of English
No ratings yet
Idioms the Colorful Language of English
10 pages
Bec-Chemistry Conference Brochure
No ratings yet
Bec-Chemistry Conference Brochure
2 pages
P.O.BOX 25423: Ilala, Dar Es Salaam
No ratings yet
P.O.BOX 25423: Ilala, Dar Es Salaam
10 pages
School Bus Safety Book
No ratings yet
School Bus Safety Book
90 pages
T650 - Engineering Science N4 April Memo 2021
No ratings yet
T650 - Engineering Science N4 April Memo 2021
11 pages
Unit 2: Planning and Organizing at Supervisory Level: Presented by Er - Shital Bhalgat
100% (2)
Unit 2: Planning and Organizing at Supervisory Level: Presented by Er - Shital Bhalgat
50 pages
Formulation Development and Evaluation of Self Nano Emulsifying Drug Delivery System of Dolutegravir
No ratings yet
Formulation Development and Evaluation of Self Nano Emulsifying Drug Delivery System of Dolutegravir
15 pages
SOP-000182638 Phase 1b Investigation Checklist
No ratings yet
SOP-000182638 Phase 1b Investigation Checklist
3 pages
Ho 2010
No ratings yet
Ho 2010
6 pages
Other Country Barry Lopez And The Community Of Artists 1st Edition James Perrin Warren pdf download
No ratings yet
Other Country Barry Lopez And The Community Of Artists 1st Edition James Perrin Warren pdf download
76 pages
Scanning Ion Microscopy and Its Application in Microbiology
No ratings yet
Scanning Ion Microscopy and Its Application in Microbiology
9 pages
Ut 01
No ratings yet
Ut 01
116 pages
Complementos Deber 2 Bim Ej 3
No ratings yet
Complementos Deber 2 Bim Ej 3
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Data Mining Clustering Techniques

Uploaded by

Data Mining Clustering Techniques

Uploaded by

Engineering and Scientific International Journal (ESIJ) ISSN 2394-7187(Online)

Volume 3, Issue 2, April – June 2016 ISSN 2394 - 7179 (Print)

Data Mining Clustering Techniques

Abstract— Data mining is a powerful technology to extract • Sequence pattern

• Similarity measures of a single-relink process of chaining References

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.