0% found this document useful (0 votes)

78 views

Unit 8

The document discusses classification and prediction techniques for data analysis. It defines classification, clustering, decision tree identification, and linear and nonlinear regression. It provides examples and comparisons of different classification and prediction methods.

Uploaded by

kamalshrish

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

78 views

Unit 8

Uploaded by

kamalshrish

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Unit eight

Classification maps each data elements to one of a set of pre-determined classes

based on the difference among data elements belonging to different classes.
Clustering groups data elements into different groups based on the similarity
between elements with a single group.

Definition
 Classification and prediction are two forms of data analysis that can be used to
extract models describing important data classes or to predict future data
trends.
 Such analysis can help provide us with a better understanding of the data at
large.
 Whereas classification predicts categorical (discrete, unordered) labels,
prediction model continuous valued functions.
 Many classification and prediction methods have been proposed by researchers
in machine learning, pattern recognition, and statistics.

Issue Regarding Classification and Prediction

 Preparing the Data for Classification and Prediction
 Comparing Classification and Prediction Methods

Preparing the Data for Classification and Prediction

The following preprocessing steps may be applied to the data to help improve the
accuracy, efficiency, and scalability of the classification or prediction process:
 Data Cleaning
This refers to the preprocessing of data in order to remove or reduce noise and the
treatment of missing values.
 Relevance analysis
Many of the attributes in the data may be redundant. Correlated analysis can be used
to identify whether any two given attributes are statistically related.
 Data Transformation and reduction
The data may be transformed by normalization, particularly when neural networks or
methods involving distance measurements are used in the learning steps.

Comparing Classification and Prediction Methods

Classification and prediction methods can be compared and evaluated according to
the following criteria:
 Accuracy
This refers to the ability of a given classifier to correctly predict the class label of new
or previously unseen data (i.e., tuples without class label information)
 Speed
This refers to the computational costs involved in generating and using the given
classifier or predictor.
 Robustness
This refers to ability of the classifier or predictor to make correct predictions given
noisy data or data with missing values
 Scalability
This refers to the ability to construct the classifier or predictor efficiently given large
amounts of data
 Interpretability
This refers to the level of understanding and insight that is provided by the classifier
or predictor.

Page 1 Unit 8
Classification Techniques
Decision Tree Identification

Classification Problem
Weather  Play( Yes, No)

Hunt’s Method for decision tree identification:

Given N element types and m decision classes:
1. For i 1 to N do

i. Add element I to the i-1 element item-sets from the previous iteration

ii. Identify the set of decision classes for each item-set

iii. If an item-set has only one decision class, then that item-set is done, remove
that item-set from subsequent iterations.

.2 done

Classification Techniques
Decision Tree Identification

Sunny  Yes
Cloudy  Yes/No
Overcast  Yes/No

Page 2 Unit 8
Decision Tree Identification

Cloudy  Yes
Warm
Cloudy No
Chilly
Cloudy Yes
Pleasant

Decision Tree Identification

Overcast
Warm
Overcast No
Chilly
Overcast Yes
Pleasant

Page 3 Unit 8
Decision Tree Identification Example:
 Top down technique for decision tree identification
 Decision tree created is sensitive to the order in which items are considered
 If an N-item-set does not result in a clear decision.
Classification classes have to be modeled by rough sets.

Clustering Techniques
 Clustering partitions the data set into clusters or equivalence classes.
 Similarity among members of a class more than similarity among members
across classes.
Similarity measures: Euclidian distance or other application specific measures.

Clustering Techniques
Nearest Neighbour Clustering Algorithm:
Given n elements X1, X2, …. Xn, and threshold t, .
1. j  1, k  1, cluster = { }
2. Repeat
I. Find the nearest neighbour of xj
II. Let the nearest neighbour be in cluster m
III. If distance to nearest neighbour >t, then create a new cluster and
k k+1; else assign xj to cluster m
IV. j j+1
3. until j>n

Iterative partitional Clustering

Given n element x1, x2, . . . Xn, and k clusters, each with a center.
1. Assign each element to its closest cluster center
2. After all assignments have been made, compute the cluster centroids for each
of the cluster
3. Repeat the above two steps with the new centroids until the algorithm
converges

Regression
Numeric prediction is the task of predicting continuous (or ordered) values for given
input
For example:
We may wish to predict the salary of college graduates with 10 years of work
experience, or the potential sales of a new product given its price.
The mostly used approach for numeric prediction is regression
A statistical methodology that was developed by Sir Frances Galton (1822-1911), a
mathematician who was also a cousin of Charles Darwin

In many texts use the terms “regression” and “numeric prediction” synonymously
Regression analysis can be used to model the relationship between one or more
independent or predictor variables and a dependent or response variable (which is
continuous value)
In the context of data mining, the predictor variables are the attributes of interest
describing the tuple
The response variable is what we want to predict

Page 4 Unit 8
Types of Regression
The types of Regression are as:

 Linear Regression
 NonLinear Regression

Linear Regression
Straight-line regression analysis involves a response variable, y, and a single
predictor variable, x.
It is the simplest form of regression, and models y as a linear function of x.
That is,
y=b+wx
Where the variance of y is assumed to be constant, and b and w are regression
coefficients specifying the Y-intercept and slope of the line, respectively.

The regression coefficient, w and b, can also be thought of as weight, so that we can
equivalent write,
y=w0+w1x.
The regression coefficient can be estimated using this method with the following
equations:
[Refer to write board:]
Example Too:

Multiple Linear Regression

The multiple linear regression is an extension of straight-line regression so as to
involve more than one predictor variable.
An example of a multiple linear regression model based on two predictor attributes or
variables, A1 and A2, is
y=w0+w1x1+w2x2,

Where x1 and x2 are the values of attributes A1 and A2, respectively, in X.

Multiple regression problems are instead commonly solved with the use of statistical
software packages, such as SPSS(Statistical Package for the Social Sciences),
etc..

NonLinear Regression
The straight-line linear regression case where dependent response variable, y, is
modeled as a linear function of a single independent predictor variable, x.
If we can get more accurate model using a nonlinear model, such as a parabola or
some other higher-order polynomial?
Polynomial regression is often of interest when there is just one predictor variable.
Consider a cubic polynomial relationship given by
y=w0+w1x+w2xsq2+w3xcu3

NonLinear Regression
In statistics, nonlinear regression is a form of regression analysis in which
observational data are modeled by a function which is a nonlinear combination of the
model parameters and depends on one or more independent variables. The data are
fitted by a method of successive approximations.
Contents
Clustering

Page 5 Unit 8
Definition
The process of grouping a set of physical or abstract objects into classes of similar
objects is called clustering.
A cluster is a collection of data objects that are similar to one another within the same
cluster and are dissimilar to the objects in other clusters. A cluster of data objects can
be treated collectively as one group and so may be considered as a form of data
compression.
First the set is partitioned into groups based on data similarity (e.g., using
clustering), and then labels are assigned to the relatively small number of groups.
It is also called unsupervised learning. Unlike classification, clustering and
unsupervised learning do not rely on predefined classes and class-labeled training
examples. For this reason, clustering is a form of learning by observation, rather than
learning by examples.

Definition
Clustering is also called data segmentation in some applications because clustering
partitions large data sets into groups according to their similarity.
Clustering can also be used for outlier detection, where outliers (values that are “far
away” from any cluster) may be more interesting than common cases.

Advantages
Advantages of such a clustering-based process:
 Adaptable to changes
 Helps single out useful features that distinguish different groups.

Applications of Clustering
 Market research
 Pattern recognition
 Data analysis
 Image processing
 Biology
 Geography
 Automobile insurance
 Outlier detection

K-Mean Algorithm
Input:
k: the number of clusters,
D: a data set containing n objects.
Output: A set of k clusters.
Method:
(1) arbitrarily choose k objects from D as the initial cluster centers;
(2) repeat
(3) (re)assign each object to the cluster to which the object is the most similar, based
on the mean value of the objects in the cluster;
(4) update the cluster means, i.e., calculate the mean value of the objects for each
cluster;
(5) until no change;

Page 6 Unit 8
K-Medoids Algorithm
Input:
k: the number of clusters,
D: a data set containing n objects.
Output: A set of k clusters.
Method:
(1) arbitrarily choose k objects in D as the initial representative objects or seeds;
(2) repeat
(3) assign each remaining object to the cluster with the nearest representative
object;
(4) randomly select a nonrepresentative object, orandom;
(5) compute the total cost, S, of swapping representative object, oj, with orandom;
(6) if S < 0 then swap oj with orandom to form the new set of k representative
objects;
(7) until no change;

Bayesian Classification
Bayesian Classification is based on Baye’s theorem.
Studies comparing classification algorithms have found a simple Bayesian classifier
known as the naïve Bayesian classifier to be comparable in performance with decision
tree and selected neural network classifiers.
Bayesian classifiers have also exhibited high accuracy and speed when applied to
large database.
Naïve Bayesian classifiers assume that the effect of an attribute value on a given
class is independent of the values of the other attributes. This assumption is called
class conditional independence.
Bayesian belief networks are graphical models, which unlike naïve Bayesian
classifiers, allow the representation of dependencies among subsets of attributes.

Bayes’ Theorem
Bayes’

Page 7 Unit 8

Optimization Methods (Lecture India)
50% (2)
Optimization Methods (Lecture India)
267 pages
Online Photography Management System 1
100% (1)
Online Photography Management System 1
4 pages
Unit 3 - Data Warehouse Physical Design
No ratings yet
Unit 3 - Data Warehouse Physical Design
58 pages
Journal of Structural Engineering Volume 110 Issue 7 1984 (Doi 10.1061 - (ASCE) 0733-9445 (1984) 110 - 7 (1513) ) Mitchell, Denis Cook, William D. - Preventing Progressive Collapse of Slab ST
No ratings yet
Journal of Structural Engineering Volume 110 Issue 7 1984 (Doi 10.1061 - (ASCE) 0733-9445 (1984) 110 - 7 (1513) ) Mitchell, Denis Cook, William D. - Preventing Progressive Collapse of Slab ST
20 pages
Linear and Non-Linear Text
100% (1)
Linear and Non-Linear Text
46 pages
DataMining_Unit-3
No ratings yet
DataMining_Unit-3
8 pages
DM Unit 4
No ratings yet
DM Unit 4
22 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
Data Mining Unit 3
No ratings yet
Data Mining Unit 3
50 pages
DM Chapter 4
No ratings yet
DM Chapter 4
47 pages
Classification
No ratings yet
Classification
50 pages
DWDM 4
No ratings yet
DWDM 4
58 pages
Classification & Prediction
No ratings yet
Classification & Prediction
19 pages
Fundamentals of Data Science Unit 4
100% (1)
Fundamentals of Data Science Unit 4
31 pages
UNIT 3 DM
No ratings yet
UNIT 3 DM
34 pages
Unit-4 Data Mining
No ratings yet
Unit-4 Data Mining
19 pages
Chapter 4 Classification
No ratings yet
Chapter 4 Classification
78 pages
updated dm unit 3
No ratings yet
updated dm unit 3
28 pages
3 DM Classification
No ratings yet
3 DM Classification
55 pages
DM UNIT-3
No ratings yet
DM UNIT-3
23 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
4 - Data Analytics Using DM and ML Algorithms - 1
No ratings yet
4 - Data Analytics Using DM and ML Algorithms - 1
71 pages
Down 4
No ratings yet
Down 4
83 pages
Classification and Prediction
No ratings yet
Classification and Prediction
41 pages
Classification & Prediction
No ratings yet
Classification & Prediction
24 pages
Classification Unit3
No ratings yet
Classification Unit3
15 pages
DM Module-3 Notes
No ratings yet
DM Module-3 Notes
25 pages
Classification, Prediction
100% (1)
Classification, Prediction
67 pages
Module 04
No ratings yet
Module 04
75 pages
Classification Algorithm
No ratings yet
Classification Algorithm
78 pages
08 Class Basic
No ratings yet
08 Class Basic
103 pages
Unit Iii Classification
No ratings yet
Unit Iii Classification
57 pages
DM - Ch4 - Classification (Part1)
No ratings yet
DM - Ch4 - Classification (Part1)
20 pages
Data Mining-Unit-3
No ratings yet
Data Mining-Unit-3
16 pages
5.classification and Prediction
No ratings yet
5.classification and Prediction
9 pages
Unit 3
No ratings yet
Unit 3
16 pages
New Classification11
No ratings yet
New Classification11
98 pages
Unit V - Classification and Prediction 2020-21
100% (1)
Unit V - Classification and Prediction 2020-21
68 pages
Unit-6: Classification and Prediction
No ratings yet
Unit-6: Classification and Prediction
63 pages
08 Class Basic
No ratings yet
08 Class Basic
141 pages
Unit 4
No ratings yet
Unit 4
20 pages
Classification in Data Mining 12
No ratings yet
Classification in Data Mining 12
7 pages
Unit 8 Classification and Prediction: Structure
No ratings yet
Unit 8 Classification and Prediction: Structure
16 pages
classify vs pedict
No ratings yet
classify vs pedict
6 pages
Unit 3 (DWDM)
No ratings yet
Unit 3 (DWDM)
23 pages
Classification and Clustering
No ratings yet
Classification and Clustering
59 pages
1.0 Modeling: 1.1 Classification
No ratings yet
1.0 Modeling: 1.1 Classification
5 pages
Discovering Knowledge in Data: Lecture Review of
No ratings yet
Discovering Knowledge in Data: Lecture Review of
20 pages
Classification and Prediction
No ratings yet
Classification and Prediction
40 pages
Classfication and Prediction
No ratings yet
Classfication and Prediction
133 pages
Unit 2
No ratings yet
Unit 2
55 pages
TTDS Lecture 4
No ratings yet
TTDS Lecture 4
31 pages
Unit-5_3161610
No ratings yet
Unit-5_3161610
92 pages
Classification and Prediction: Data Mining 이복주 단국대학교 컴퓨터공학과
No ratings yet
Classification and Prediction: Data Mining 이복주 단국대학교 컴퓨터공학과
75 pages
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
No ratings yet
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
43 pages
7 Classification
100% (3)
7 Classification
63 pages
DWDM Unit-3: What Is Classification? What Is Prediction?
No ratings yet
DWDM Unit-3: What Is Classification? What Is Prediction?
12 pages
Module 3_classification
No ratings yet
Module 3_classification
9 pages
Classification (Part II)
No ratings yet
Classification (Part II)
162 pages
DWDM - Unit - V
No ratings yet
DWDM - Unit - V
93 pages
V1-CH-6-Classification and Prediction
No ratings yet
V1-CH-6-Classification and Prediction
38 pages
3-Classification, Clustering and Prediction
No ratings yet
3-Classification, Clustering and Prediction
142 pages
4 Classification
No ratings yet
4 Classification
20 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
ICT Ed 465 Visual Programming C
No ratings yet
ICT Ed 465 Visual Programming C
4 pages
Assignment Stat I
No ratings yet
Assignment Stat I
1 page
Unit Five What Is KDD?
No ratings yet
Unit Five What Is KDD?
3 pages
Sad - Data Modeling - Data Modeling - Wattpad
No ratings yet
Sad - Data Modeling - Data Modeling - Wattpad
11 pages
Unit 2 EER Model
No ratings yet
Unit 2 EER Model
1 page
Neural Syllabus
No ratings yet
Neural Syllabus
1 page
Introduction:-: Database Security Database Management System - 2
No ratings yet
Introduction:-: Database Security Database Management System - 2
9 pages
Section A Attempt Any Two Questions: (2×10 20)
No ratings yet
Section A Attempt Any Two Questions: (2×10 20)
1 page
Unit 1 Relational Model
No ratings yet
Unit 1 Relational Model
1 page
Kamal 6th Sem Web 2074
No ratings yet
Kamal 6th Sem Web 2074
2 pages
Kamal 8th Sem Mining 2074
No ratings yet
Kamal 8th Sem Mining 2074
1 page
Database Administration
No ratings yet
Database Administration
4 pages
Unit 5 Standards
No ratings yet
Unit 5 Standards
1 page
AI Full
No ratings yet
AI Full
109 pages
Identical Vs Equality
No ratings yet
Identical Vs Equality
8 pages
Advantages and Extra Functions of Distributed
No ratings yet
Advantages and Extra Functions of Distributed
3 pages
Advantages of Centralized
No ratings yet
Advantages of Centralized
10 pages
What Are The Main Characteristics of Data Warehouse
No ratings yet
What Are The Main Characteristics of Data Warehouse
31 pages
Model Questio N 7. What Are HTTP Protocol Methods? Explain. 5
No ratings yet
Model Questio N 7. What Are HTTP Protocol Methods? Explain. 5
1 page
Mathematical Modeling of Pump System
No ratings yet
Mathematical Modeling of Pump System
4 pages
An Assessment of Mooring Systems’ Forces of Ships Berthed at Dolphins
No ratings yet
An Assessment of Mooring Systems’ Forces of Ships Berthed at Dolphins
12 pages
4.4.2.2 - Derivation of The Diffusivity Equation in Radial-Cylindrical Coordinates
No ratings yet
4.4.2.2 - Derivation of The Diffusivity Equation in Radial-Cylindrical Coordinates
4 pages
Non-Linear Curve Fit Proof
0% (1)
Non-Linear Curve Fit Proof
5 pages
Week 1
No ratings yet
Week 1
23 pages
Function Approximation, Interpolation, and Curve Fitting
No ratings yet
Function Approximation, Interpolation, and Curve Fitting
53 pages
Modern Control Theory 06EE55: Subject Code IA Marks
No ratings yet
Modern Control Theory 06EE55: Subject Code IA Marks
2 pages
Fly Rock Prediction by Multiple Regression Analysis in Esfordi Phosphate Mine of Iran
No ratings yet
Fly Rock Prediction by Multiple Regression Analysis in Esfordi Phosphate Mine of Iran
11 pages
1 s2.0 S0096300314017822 Main
No ratings yet
1 s2.0 S0096300314017822 Main
15 pages
09 Mathematical Models
No ratings yet
09 Mathematical Models
4 pages
A Review of Methods For Harmonic Power Flow Analyses
No ratings yet
A Review of Methods For Harmonic Power Flow Analyses
7 pages
Chapter 12 Numerical Simulation: The Stream Function - Vorticity Method
No ratings yet
Chapter 12 Numerical Simulation: The Stream Function - Vorticity Method
22 pages
3 Model Matematik Sistem
No ratings yet
3 Model Matematik Sistem
34 pages
Analysis of Systems With Sector Nonlinearities
No ratings yet
Analysis of Systems With Sector Nonlinearities
20 pages
The Road So Far - Aircraft Flight Mechanics by Harry Smith, PHD
No ratings yet
The Road So Far - Aircraft Flight Mechanics by Harry Smith, PHD
2 pages
Offshore Structure
100% (4)
Offshore Structure
124 pages
07 Thermics PDF
No ratings yet
07 Thermics PDF
26 pages
Lab Manual CAMA
No ratings yet
Lab Manual CAMA
75 pages
Admin, Wehr
No ratings yet
Admin, Wehr
6 pages
Report About:: Theory of Errors and Least Squares Adjustment
No ratings yet
Report About:: Theory of Errors and Least Squares Adjustment
47 pages
Modelo de Reservas
No ratings yet
Modelo de Reservas
8 pages
From BS5950 To EC3
No ratings yet
From BS5950 To EC3
40 pages
Finite Element Analysis in Metal Forming: A Presentation By: Abhishek V Hukkerikar & Jitendra Singh Rathore
No ratings yet
Finite Element Analysis in Metal Forming: A Presentation By: Abhishek V Hukkerikar & Jitendra Singh Rathore
38 pages
PEB2063 - Production Engineering I
No ratings yet
PEB2063 - Production Engineering I
39 pages
AspenHYSYSDynModelingV7_0-Ref
No ratings yet
AspenHYSYSDynModelingV7_0-Ref
227 pages
Dynamic Analysis of Steel Frames With FL
No ratings yet
Dynamic Analysis of Steel Frames With FL
21 pages
Problem 6 010
No ratings yet
Problem 6 010
17 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Unit 8

Uploaded by

Unit 8

Uploaded by

Unit eight

Classification maps each data elements to one of a set of pre-determined classes

Issue Regarding Classification and Prediction

Preparing the Data for Classification and Prediction

Comparing Classification and Prediction Methods

Hunt’s Method for decision tree identification:

ii. Identify the set of decision classes for each item-set

Decision Tree Identification

Iterative partitional Clustering

Multiple Linear Regression

Where x1 and x2 are the values of attributes A1 and A2, respectively, in X.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.