0% found this document useful (0 votes)

69 views

Datamining & Cluster Coputing

This document discusses data mining and clustering algorithms. It provides background on data mining, defining it as the process of discovering useful patterns from large amounts of data. The document outlines the key stages in knowledge discovery from databases. It then focuses on clustering, describing it as the process of segmenting a database into subsets or clusters. Three clustering algorithms are discussed in detail: PAM (Partitioning Around Medoids), CLARA (Clustering Large Applications), and CLARANS (Clustering Large Applications based on Randomized Search). These algorithms use different approaches for partitioning data into clusters in an efficient manner for large datasets.

Uploaded by

P Venu Gopala Rao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views

Datamining & Cluster Coputing

Uploaded by

P Venu Gopala Rao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 16

DATA MINING AND ASSOCIATION RULES

DEPARTMENTOFCOMPUTERSCIENCE&ENGINNERING

By
R. CHANDRA SEKHAR

(02121A0514)

V. PENCHALA PRAVEEN

(02121A0563)

DEPARTMENTOFCOMPUTERSCIENCE&ENGINNERING

SREEVIDYANIKETHANENGINEERINGCOLLEGE
SREESAINATHNAGAR,A.RANGAMPET,(A.P)517102

INDEX

1.Abstract
2.Introduction
2.1 What is data mining
2.2 Knowledge discovery in databases are
2.3 knowledge discovery in databases stages
3.Data Mining
3.1 Techniques
3.2 Applications
4.Clustering
4.1 Components of Clustering Task
4.2 Stages in Clustering
5. Clustering Techniques
6. Partitional Algorithms
7 k- medoid Algorithms
7.1 PAM (Partitioning Around Medoids)
7.1.1 Partitioning
7.1.2 Iterative selection of medoids
7.1.3 PAM Algorithm
7.2 CLARA
7.2.1 CLARA algorithm
7.3 CLARANS
7.3.1 CLARANS algorithm
7 Conclusion
8 Reference

Page No.
1
2
2
2
3
4
5
5
6
6
7
7
8
8
8
8
9
10
10
11
12
13
14

ABSTRACT
With the explosive growth of Data, the extraction of useful information
from it has become a major task. Data Mining is the non-trivial process of
identifying valid, novel, potentially useful and ultimately understandable patterns in
data. Among the areas of data mining, the problem of clustering data objects has
received a great deal of attention.
Clustering segments a database into subsets or clusters. Clustering is a
useful technique for discovery of data distribution and patterns in the underlaying
data. The goal of clustering is to discover dense and sparse regions in a data set.
We are mainly focusing on finding Cluster groups in data base,
which we present mainly three algorithms namely,

PAM ( Partitioning Aranound Medoids ) algorithm

CLARA ( Clustering LARge Applications ) algorithm
CLARANS (Clustering Large Applications based on
RANdomized Search)

for

INTRODUCTION
What is Data Mining?
The past two decades has seen a dramatic increase in the amount of
information or data being stored in electronic format. This accumulation of data
has taken place at an explosive rate. It has been estimated that the amount of
information in the world doubles every 20 months and the size and number of
databases are increasing even faster. The increase in use of electronic data
gathering devices such as point-of-sale or remote sensing devices has contributed
to this explosion of available data. The following figure from the Red Brick
Company illustrates the data explosion

Data storage became easier as the availability of large amounts of

computing power at low cost. i.e., the cost of processing power and storage is
falling, made data cheap. There was also the introduction of new machine learning
methods for knowledge representation based on logic programming etc. In addition
to traditional statistical analysis of data. The term Data Mining has been stretched
beyond its limits to apply to any form of data analysis. Some of the numerous
definitions of Data Mining, are:Knowledge Discovery in Databases are:
Data Mining or Knowledge Discovery in Databases (KDD) is the nontrivial extraction of implicit, previously unknown, and potentially useful
information from data. This encompasses a number of different technical
approaches, such as, clustering, data summarization, learning classification rules,
finding dependency networks, analyzing changes and detecting anomalies.
Data Mining is the search for relationships and global patterns that exist in
large databases but are 'hidden' among the vast amount of data, such as, a
relationship between patient data and their medical diagnosis.

The analogy with the mining process is described as:Data Mining refers to "Using a variety of techniques to identify nuggets of
information or decision-making knowledge in bodies of data, and extracting these
in such a way that they can be put to use in the areas such as decision support,
prediction, forecasting and estimation. The data is often voluminous, but as it
stands of low value as no direct use can be made of it; it is the hidden
information in the data that is useful. Recorded events is nothing but the data
providing structure to the data is called information. Day to day information
grows vastly, so extracting useful information is difficult. So we must have set of
tools to analyze the information. This set of tools is called Data Mining.
Knowledge Discovery in Databases:
The term knowledge means relationships and patterns between data
elements. Knowledge discovery process consists of 6 stages, in which Data
Mining is discovery stage of knowledge discovery.
The stages in KDD are:1.
2.
3.
4.
5.
6.

Data Selection
Cleaning
Enrichment
Coding
Data Mining
Reporting.

The fifth stage, the Data Mining, is the phase of real discovery. Data
Mining methodology states that in the optimal situation, data mining is an
ongoing process in which one should continually work on their data, constantly
identify new information needs and trying to improve data to make it match the
goals better, so that any organization becomes a learning system. Since most of
the phases need the input of a great deal of creativity such process enables and
encourages this creativity by refusing to impose any limit on possible activities.

Data Mining
Requirements

Deployment
Interpretation

Data
Selection

Data
Cleaning

Enrichment
and Coding

Data
Mining

Reporting

THE PROCESS OF KNOWLEDGE DISCOVERY

DATA MINING TECHNIQUES:

Data Mining is not single technique as the idea that there is more
knowledge hidden in the data than show itself on the surface. Any technique that
helps extract more out of useful data. So Data Mining techniques form quite
separate heterogeneous groups.
The different Data Mining techniques that are useful are:

Association Rules
Classification Rules
Query Tools
Cluster computing
Statistical Techniques
Decision Trees
Visualization
On Line Analytical Processing (OLAP)
K-nearest neighbours
Neural Networks
Genetic Algorithms

DATA MINING APPLICATIONS

MARKETING MANAGEMENT

RISK MANAGEMENT

FRAUD MANAGEMENT

Target Marketing
Cross Selling
Customer Retention
Market Basket Analysis
Marketing Segmentation

Fore-casting
Customer Retention
Improved Underwriting
Quality Control
Competitor Analysis

Fraud Detection

The above figure relates the techniques of Data Mining to the Applications
in Business environment.

CLUSTERING
Clustering is a division of data into groups of similar objects. Each
group, called cluster, consists of objects that are similar between themselves and
dissimilar to objects of other groups.
An example of clustering is depicted in Figures below. The input patterns
are shown in Figure (a), and the desired clusters are shown in Figure (b). Here,
points belonging to the same cluster are in same group. The variety of techniques
for representing data, measuring proximity (similarity) between data elements, and
grouping data elements has produced a rich and often confusing assortment of
clustering.

Components of a Clustering Task

Typical pattern clustering activity involves the following steps
(1) pattern representation (optionally including feature extraction and/or selection),
(2) definition of a pattern proximity measure appropriate to the data domain,
(3) clustering or grouping,
(4) data abstraction (if needed), and
(5) assessment of output (if needed).
Figure 2 depicts a typical sequencing of the first three of these steps, including a
feedback path where the grouping process output could affect subsequent feature
extraction and similarity computations.

Pattern representation refers to the number of classes, the number of

available patterns, and the number, type, and scale of the features available to the
clustering algorithm. Some of this information may not be controllable by the
practitioner.
Stages in Clustering:
1. Feature selection is the process of identifying the most effective subset of
the original features to use in clustering. Feature extraction is the use of
one or more transformations of the input features to produce new salient
features . Either or both of these techniques can be used to obtain an
appropriate set of features to use in clustering.
2. Pattern proximity is usually measured by a distance function defined on
pairs of patterns. A variety of distance measures are in use in the various
communities. A simple distance measure can often be used to reflect
dissimilarity between two patterns, whereas other similarity measures can be
used to characterize the conceptual similarity between patterns.
3.

The grouping step can be performed in a number of ways. The output

clustering (or clusterings) can be hard (a partition of the data into groups) or
fuzzy (where each pattern has a variable degree of membership in each of
the output clusters).

CLUSTERING TECHNIQUES:
There are two main types of clustering techniques:
1. Partitional clustering:
The partitional clustering techniques construct a partition of the
database
into predefined number of clusters. It attempts to
determine k partitions that optimize a certain criterion function. The
partitional clustering algorithms are of two types:
k-means algorithms
k-medoid algorithms
2. Hierarchical clustering:
The hierarchical clustering techniques do a sequence of partitions in
which each partition is nested into the next partition in the
sequence. It creates a hierarchy of clusters from small to big or big
to small. The partitional clustering algorithms are of two types:
Agglomerative Technique
Divisive technique
In these techniques we are mainly focusing on

PARTITIONAL ALGORITHMS.

PARTITIONAL ALGORITHMS
Partitional algorithms construct a partition of a database of N objects into
a set of k clusters. The construction involves determining the optimal partition
with respect to an objective function. There are approximately k** N/k! Ways of
partitioning a set of N data points into k subsets.
The partitional clustering algorithm usually adopts iterative optimization
paradigm. It starts with an initial partition and uses an iterative control strategy.
It tries swapping of data points to see if such a swapping improves the quality of
clustering. when no swapping yields improvements in clustering it finds a locally
optimal partition. This quality of clustering is very sensitive to initially selected
partition.
There are mainly two different categories of the partitioning algorithms.
1. k-means algorithms, where each cluster is represented by the center of
gravity of the cluster.
2. k-medoid algorithms, where each cluster is represented by one of the
objects of the cluster located near the center.
Most of special clustering algorithms designed for data mining are
k-medoid algorithms.

k-Medoid Algorithms
PAM(Partitioning Around Medoids)
PAM(Partitioning Around Medoids) uses a k-medoid method to identify the
clusters. PAM selects k objects arbitrarily from the data as medoids. In each step,
a swap between a selected object Oi and a non-selected object Oh is made, as
long as such a swap would result in an improvement of the quality of clustering.
To caliculate the effect of such a swap between Oi and Oh a cost Cih is
computed, which is related to the quality of partitioning the non-selected objects
to k-clusters represented by the medoids.
It is necessary first to understand the method of partitioning of the data
objects when a set of k medoids are given.
PARTITIONING
If Oj is a non-selected object and Oi is a medoid, we then say Oj
belongs to the cluster represented by Oi, if d(Oi,Oj)=MIN oe d(Oj,Oe) see Figure
(b) below, where the minimum is taken over all medoids Oe and d(Oa,Oh)
determines the distance of dissimilarity between objects Oa and Oh see Figure (a)
below. The dissimilarity matrix is known prior to the commencement of PAM.
The quality of clustering is measured by the average dissimilarity between an
object and the medoid of the cluster to which the object belongs.

ITEATIVE SELECTION OF MEDOIDS

Let us assume that O1, O2, .. ., Ok are k medoids
We denote C1, C2, ...., Ck are the respective clusters.
discussion, for a non-selected object Oj, j 1, 2, ..,
Min(1<i<k)d(Oj, Oi) = d(Oj, Oh). Let us now analyze the
Oi and Oh.

selected at any stage.

From the foregoing
k if Oj Ch then
effect of swapping

In other words let us compare the quality of clustering, if we select k

medoids as O1, O2, .. ., Oi-1, Oh, Oi+1, .. Ok, where Oh replaces Oi as one of
the medoids, there will be three types of changes that can occur in actual
clustering. They are,
1. A non-selected object Oj, such that Oj Ci before swapping Oj Ch after
swapping.
This case arise when the following conditions hold:
Min d(Oj, Oe) = d(Oj, Oi), before swapping and
Min e i d(Oj, Oe) = d(kOj, Oh) after swapping.

2. A non-selected object Oj, such that Oj Ci before swapping Oj Cj' after

Swapping and j' h
This case arise when the following conditions hold:
Min d(Oj, Oe) = d(Oj, Oi), before swapping and
Min d(Oj, Oe) = d(Oj, Oj'), j' h after swapping.
Define a cost as Cjih = d(Oj, Oj') d(Oj, Oi)

A non-selected object Oj, such that Oj Cj' before swapping Oj Ch

after Swapping
This case arise when the following conditions hold:
Min d(Oj, Oe) = d(Oj, Oj'), before swapping and
Min d(Oj, Oe) = d(Oj, Oh), after swapping.
Define a cost as Cjih = d(Oj, Oh) d(Oj, Oj')

Define the total cost of swapping Oi and Oh as Chi = j Cjih, if Cih is

negative then the quality of clustering is improved by making Oh as a medoid in
place of Oi. The process is repeated until we cannot find a negative Cih. The
algorithm can be finally stated as follows:
PAM Algorithm
Input: data base of object D.
Select arbitrarily k representative objects. Mark these objects as selected and
mark the remaining as non-selected.

Repeat until no more objects are to be classified .

Do for all selected objects Oi.

Do for all non-selected objects Oh.
Compute Cih
End do.
End do.

Select imin,hmin such that C imin,hmin = Min i,h Cih

If C imin,hmin < 0
Then mark Oi as non-selected and Oh as selected
Do repeat

Find cluster C1, C2, .., Ch

CLARA
it can be observed that the major computational efforts for PAM are to
determine k medoids through an iterative optimaization. CLARA though follows
the same principle, attempts
to handle large datasets. Instead of finding
representative objects for the entire dataset, CLARA draws a sample of the
dataset, applies PAM on this sample and finds the medoids of the sample. If the
sample were drawn in a sufficiently random way , the medoids of the sample
would approximate the medoids of the entire dataset. The steps of CLARA are
summarized below:
ALGORITHM:
Input: DataBase of D objects.
Repeat until
Draw a sample S

D randomly from D.

Call PAM (S , K) to get K medoids.

Classify the entire data set D to C1 , C2, , Ck.
Calculate the quality of clustering as the average dissimilarity.
End.
CLARANS
CLARANS (Clustering Large Applications based on RANdomized Search)
is similar to PAM but it applies a randomized Iterative-Optimization for
determination of medoids. It is easy to see that in PAM, at every iteration, we
examine k(N-k) swapping to determine the pair corresponding to minimum cost.
On the otherhand, CLARA tries to examine fewer elements by restricting its
search to smaller sample of the database. Thus if the sample size is S N, it
examines at most k(S-k) pairs at every iteration.
CLARANS does not restrict the search to any particular subset of objects.
Neither does it search the entire data set. It randomly selects few pairs for
swapping at the current state. CLARANS, like PAM, starts with a randomly
selected set of k medoids. It checks at most the maxneighbour number of pairs
for swapping and if a pair is with negative cost is found, it updates the medoid
set and continues. Otherwise, it records the current selection of medoids as a
local optimum and restarts with a new randomly selected medoid-set to search for
another local optimum. CLARANS stops after the numlocal number of local
optimal medoid sets are determined and return best among these.

CLARANS ALGORITHM :

Input(D , k, maxneighbour and numlocal)

Select arbitrarily k representative objects. Mark these objects as selected and

all

other objects as non-selected. Call it current.

Set e = 1.

Do While(e numlocal)

Set j=1.

Do While (j maxneighbour)
Consider randomly a pair (I, h) such that Oi is a selected object and
Oh is a non-selected object.
Calculate the cost Cih.
If Cih is negative
update current
Mark Oi non-selected, Oh selected and j=1
Else
Increment j j+1
End do.

Compare the cost of clustering with mincost

If current_cost < mincost

Mincost current_cost
Best_node current

increment e e+1

End do

Return best node

CONCLUSION
PAM is iterative optimization that combines relocation of points
between perspective clusters with re-nominating the points as potential medoids. The
guiding principle for the process is the effect on an objective function, which,
obviously, is a costly strategy. On the otherhand, CLARA tries to examine fewer
elements by restricting its search to smaller sample of the database. CLARANS does
not restrict the search to any particular subset of objects. It randomly selects few
pairs for swapping at the current state, so it is more efficient than other two medoid
based algorithms.

REFERENCES

Data Mining Techniques by Arun K Pujari

Data Mining Concepts and Techniques by Jiawei Han,

Kamber

Database Mining: A performance perspective. IEEE Dec-1993 by

R. Agrawal, T. Imielinski.

JETE Journal of Research, Vol 47, Jan-2001.

IBM Research Paper, 20th VLDB conference, Santiago, Chile.

Micheline

Ayyappa Pooja Booklet With Bhajans Print-Version-vFINAL PDF
100% (1)
Ayyappa Pooja Booklet With Bhajans Print-Version-vFINAL PDF
220 pages
18 Gut Brain and Pandas Pans
100% (2)
18 Gut Brain and Pandas Pans
76 pages
Spiritual Gateway of Knowledge .Phulajibaba - English Book
100% (2)
Spiritual Gateway of Knowledge .Phulajibaba - English Book
34 pages
A Brief Overview On Data Mining Survey PDF
No ratings yet
A Brief Overview On Data Mining Survey PDF
8 pages
SOQL
100% (3)
SOQL
27 pages
Mining
No ratings yet
Mining
7 pages
Unit I DM
No ratings yet
Unit I DM
27 pages
Fundamentals of Data Science Unit 1
No ratings yet
Fundamentals of Data Science Unit 1
29 pages
p144 Data Mining
100% (3)
p144 Data Mining
11 pages
Data Mining Lecture One - Docx1
No ratings yet
Data Mining Lecture One - Docx1
12 pages
DM Unit1 Intro
No ratings yet
DM Unit1 Intro
12 pages
Unit 1 Data Mining
No ratings yet
Unit 1 Data Mining
30 pages
Chapter 6 Data Mining
No ratings yet
Chapter 6 Data Mining
39 pages
Unit II Data Mining
No ratings yet
Unit II Data Mining
8 pages
DMWH M1
No ratings yet
DMWH M1
25 pages
Data Mining Implementation
No ratings yet
Data Mining Implementation
9 pages
R18CSE4102-UNIT 2 Data Mining Notes
100% (1)
R18CSE4102-UNIT 2 Data Mining Notes
31 pages
5104 - 07.S. L. Nalawade1
No ratings yet
5104 - 07.S. L. Nalawade1
5 pages
Paper Dinesh Clustering Techniques
No ratings yet
Paper Dinesh Clustering Techniques
5 pages
Yihao Final Paper CCSC for Submission
No ratings yet
Yihao Final Paper CCSC for Submission
6 pages
Data Structures: Notes For Lecture 12 Introduction To Data Mining by Samaher Hussein Ali
No ratings yet
Data Structures: Notes For Lecture 12 Introduction To Data Mining by Samaher Hussein Ali
4 pages
Data Mining - Prashant
No ratings yet
Data Mining - Prashant
10 pages
1. Introduction
No ratings yet
1. Introduction
26 pages
TPW Data Mining
No ratings yet
TPW Data Mining
4 pages
Data Mining and Its Techniques: A Review Paper: Maria Shoukat (MS Student)
No ratings yet
Data Mining and Its Techniques: A Review Paper: Maria Shoukat (MS Student)
7 pages
B SC (IT) VI-DSE3-M5
No ratings yet
B SC (IT) VI-DSE3-M5
13 pages
Data Mining and Its Applications
No ratings yet
Data Mining and Its Applications
60 pages
MCAD2223 Datamining and Warehousing - Module
No ratings yet
MCAD2223 Datamining and Warehousing - Module
132 pages
Data Mining Nostos
100% (1)
Data Mining Nostos
39 pages
Unit 1
No ratings yet
Unit 1
27 pages
Data Mining and Warehousing-1
No ratings yet
Data Mining and Warehousing-1
43 pages
Sheenaz Project
No ratings yet
Sheenaz Project
22 pages
Unit I
No ratings yet
Unit I
19 pages
Data Mining - Digital Notes (Unit I To V)
No ratings yet
Data Mining - Digital Notes (Unit I To V)
85 pages
Module1 DataMining Ktustudents - in
No ratings yet
Module1 DataMining Ktustudents - in
24 pages
Seminar on Data Mining Concepts and Its
No ratings yet
Seminar on Data Mining Concepts and Its
8 pages
5 Data Mining Proccess and Techniques - Week 7
No ratings yet
5 Data Mining Proccess and Techniques - Week 7
61 pages
Whats App
No ratings yet
Whats App
23 pages
Data Mining Note
No ratings yet
Data Mining Note
79 pages
Data Mining and Data Analysis UNIT-1 Notes For Print
No ratings yet
Data Mining and Data Analysis UNIT-1 Notes For Print
22 pages
Data Mining Concepts and Applications: Six Factors Behind The Sudden Rise in Popularity of Data Mining
No ratings yet
Data Mining Concepts and Applications: Six Factors Behind The Sudden Rise in Popularity of Data Mining
36 pages
BCA Data Mining
No ratings yet
BCA Data Mining
116 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
21 pages
datamining&warehousing
No ratings yet
datamining&warehousing
65 pages
dw and dm notes (1)
No ratings yet
dw and dm notes (1)
89 pages
Data Mining Information
100% (1)
Data Mining Information
15 pages
Absract:: Data, Information, and Knowledge
No ratings yet
Absract:: Data, Information, and Knowledge
7 pages
DWM Merged
No ratings yet
DWM Merged
125 pages
Chapter 1 - What is Data Mining
No ratings yet
Chapter 1 - What is Data Mining
8 pages
Data Mining and Data Warehousing
No ratings yet
Data Mining and Data Warehousing
25 pages
: - -: What The Data Mining?: عوضوملا
No ratings yet
: - -: What The Data Mining?: عوضوملا
6 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
DWDM 1
No ratings yet
DWDM 1
17 pages
unit-III
No ratings yet
unit-III
101 pages
Data Mining
No ratings yet
Data Mining
395 pages
U1_1
No ratings yet
U1_1
13 pages
DM Module1
No ratings yet
DM Module1
15 pages
Data Mining Using Neural Networks: Miss. Mukta Arankalle
No ratings yet
Data Mining Using Neural Networks: Miss. Mukta Arankalle
36 pages
Unit 1 Datamining
No ratings yet
Unit 1 Datamining
16 pages
Data Mining Techniques Unit-1
No ratings yet
Data Mining Techniques Unit-1
122 pages
Notes for DMDWH -Module1
No ratings yet
Notes for DMDWH -Module1
21 pages
Data Mining Prologues: K.Sankar Lecturer / M.E., (P.HD) ., D.V.Rajkumar M.C.A., M.Phil Lecturer
No ratings yet
Data Mining Prologues: K.Sankar Lecturer / M.E., (P.HD) ., D.V.Rajkumar M.C.A., M.Phil Lecturer
4 pages
Unit-4 DWM
No ratings yet
Unit-4 DWM
73 pages
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
César Pérez López
No ratings yet
SQL
No ratings yet
SQL
92 pages
UAT Testing
No ratings yet
UAT Testing
13 pages
How To Write Effective Xpath Selenium: Xpath //tag - Name (@attribute - Name 'Value')
100% (1)
How To Write Effective Xpath Selenium: Xpath //tag - Name (@attribute - Name 'Value')
6 pages
BMC Remedy Action Request System 7.5.00 Concepts Guide
No ratings yet
BMC Remedy Action Request System 7.5.00 Concepts Guide
92 pages
QT Lakshmi (To All - Entire Audience) : QT Lakshmi (To All - Entire Audience)
No ratings yet
QT Lakshmi (To All - Entire Audience) : QT Lakshmi (To All - Entire Audience)
1 page
Notes PDF
No ratings yet
Notes PDF
1 page
Application For Closure of Demat Account (NSDL/CDSL) : Ddmmyyyy
No ratings yet
Application For Closure of Demat Account (NSDL/CDSL) : Ddmmyyyy
1 page
ICICI Direct - Customer Service
No ratings yet
ICICI Direct - Customer Service
2 pages
BMC Remedy Action Request System 7.5.00 Concepts Guide
No ratings yet
BMC Remedy Action Request System 7.5.00 Concepts Guide
92 pages
BMC Remedy IT Service Management: Concepts Guide
No ratings yet
BMC Remedy IT Service Management: Concepts Guide
204 pages
SEMrush PDF Report
No ratings yet
SEMrush PDF Report
6 pages
Admin Dev New Content
No ratings yet
Admin Dev New Content
10 pages
Compound Interest
100% (1)
Compound Interest
22 pages
Keerthi Ravella
No ratings yet
Keerthi Ravella
5 pages
Gate Reference Books - The Gate Academy
No ratings yet
Gate Reference Books - The Gate Academy
2 pages
Indian Electricity Rules
No ratings yet
Indian Electricity Rules
5 pages
Analogy and Logic
No ratings yet
Analogy and Logic
2 pages
7.0 Ionic Equilibria: Tutorial
No ratings yet
7.0 Ionic Equilibria: Tutorial
13 pages
Magnetic Contactor
No ratings yet
Magnetic Contactor
91 pages
Hunting Horns Notes Trumpets Email
No ratings yet
Hunting Horns Notes Trumpets Email
3 pages
Friction - Factors Affecting Friction
No ratings yet
Friction - Factors Affecting Friction
2 pages
OWNING AND OPERATING COST OF EQUIPMENT prahlad
No ratings yet
OWNING AND OPERATING COST OF EQUIPMENT prahlad
6 pages
Thi Online 2 Unit 1
No ratings yet
Thi Online 2 Unit 1
6 pages
A Second Marriage - Light On Vedic Astrology
100% (1)
A Second Marriage - Light On Vedic Astrology
2 pages
Worldbuilding Spreadsheet
No ratings yet
Worldbuilding Spreadsheet
443 pages
New Method For Predicting The Incipient Cavitation Index by Means of Single-Phase Computational Fluid Dynamics Model
No ratings yet
New Method For Predicting The Incipient Cavitation Index by Means of Single-Phase Computational Fluid Dynamics Model
11 pages
PARTS of SPEECH Pages 2 125
No ratings yet
PARTS of SPEECH Pages 2 125
124 pages
Atlas Copco-Other Oil-Injected Rotary Screw Compressors en (Search-Manual-Online - Com)
100% (1)
Atlas Copco-Other Oil-Injected Rotary Screw Compressors en (Search-Manual-Online - Com)
9 pages
Cemm 1 Ps
No ratings yet
Cemm 1 Ps
16 pages
JS1 3RD Term Business Studies
100% (1)
JS1 3RD Term Business Studies
28 pages
Gems and Jewelry - April-2014
No ratings yet
Gems and Jewelry - April-2014
24 pages
Global+Cardio 2025-2-70 Def
No ratings yet
Global+Cardio 2025-2-70 Def
28 pages
s4 Physics Paper 3 Exam 1
No ratings yet
s4 Physics Paper 3 Exam 1
4 pages
Typingplan
No ratings yet
Typingplan
4 pages
Common Settings: Acumatica ERP 4.0
No ratings yet
Common Settings: Acumatica ERP 4.0
54 pages
DLL - Science 5 - Q4 - W5
No ratings yet
DLL - Science 5 - Q4 - W5
6 pages
Application
No ratings yet
Application
30 pages
Adhd
No ratings yet
Adhd
9 pages
Why Should Anyone Be Led You - PDF
No ratings yet
Why Should Anyone Be Led You - PDF
62 pages
The Family of Man - The Greatest Photographic Exhibition of All Time (Art Photo Edward Steichen)
100% (10)
The Family of Man - The Greatest Photographic Exhibition of All Time (Art Photo Edward Steichen)
196 pages
Examkrackers MCAT Organic Chemistry 7th Edition Jonathan Orsay download
100% (1)
Examkrackers MCAT Organic Chemistry 7th Edition Jonathan Orsay download
58 pages
DLW Information Booklet
No ratings yet
DLW Information Booklet
11 pages
Gate CS - 1992
No ratings yet
Gate CS - 1992
9 pages
Format of Seminar Report
No ratings yet
Format of Seminar Report
7 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Datamining & Cluster Coputing

Uploaded by

Datamining & Cluster Coputing

Uploaded by

DATA MINING AND ASSOCIATION RULES

PAM ( Partitioning Aranound Medoids ) algorithm

Data storage became easier as the availability of large amounts of

THE PROCESS OF KNOWLEDGE DISCOVERY

DATA MINING TECHNIQUES:

DATA MINING APPLICATIONS

Components of a Clustering Task

Pattern representation refers to the number of classes, the number of

The grouping step can be performed in a number of ways. The output

ITEATIVE SELECTION OF MEDOIDS

selected at any stage.

In other words let us compare the quality of clustering, if we select k

2. A non-selected object Oj, such that Oj Ci before swapping Oj Cj' after

A non-selected object Oj, such that Oj Cj' before swapping Oj Ch

Define the total cost of swapping Oi and Oh as Chi = j Cjih, if Cih is

Repeat until no more objects are to be classified .

Do for all selected objects Oi.

Select imin,hmin such that C imin,hmin = Min i,h Cih

Find cluster C1, C2, .., Ch

Call PAM (S , K) to get K medoids.

Input(D , k, maxneighbour and numlocal)

Select arbitrarily k representative objects. Mark these objects as selected and

other objects as non-selected. Call it current.

Compare the cost of clustering with mincost

If current_cost < mincost

Return best node

Data Mining Techniques by Arun K Pujari

Data Mining Concepts and Techniques by Jiawei Han,

Database Mining: A performance perspective. IEEE Dec-1993 by

JETE Journal of Research, Vol 47, Jan-2001.

IBM Research Paper, 20th VLDB conference, Santiago, Chile.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.