0% found this document useful (0 votes)
314 views

FDP Day1

This document provides an introduction to data mining and machine learning. It discusses how data mining is used to extract useful patterns from large datasets. Common data mining tasks include prediction, classification, clustering, and anomaly detection. It also introduces machine learning as a field that allows computers to learn from data without being explicitly programmed. Major topics in data preprocessing like data cleaning, normalization, and dimensionality reduction are covered. Finally, it distinguishes between supervised, unsupervised, and reinforcement learning algorithms.

Uploaded by

yadavsticky5108
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
314 views

FDP Day1

This document provides an introduction to data mining and machine learning. It discusses how data mining is used to extract useful patterns from large datasets. Common data mining tasks include prediction, classification, clustering, and anomaly detection. It also introduces machine learning as a field that allows computers to learn from data without being explicitly programmed. Major topics in data preprocessing like data cleaning, normalization, and dimensionality reduction are covered. Finally, it distinguishes between supervised, unsupervised, and reinforcement learning algorithms.

Uploaded by

yadavsticky5108
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 35

Fundamentals of

Data Mining
and
Machine Learning

Dr.B.Santhosh Kumar,
Associate Professor,
G. Pulla Reddy Engineering College(Autonomous),
Kurnool.
Introduction
What is Data Mining?

 The significant extraction of implicit, previously unknown and


potentially useful information from data.

 Data mining is the process of automatically discovering useful


information in large data repositories
Applications
 Banking: loan/credit card approval
 predict good customers based on old customers

 Customer relationship management


 identify those who are likely to leave for a competitor.

 Targeted marketing
 identify likely responders to promotions

 Fraud detection: telecommunications, financial transactions


 from an online stream of event identify fraudulent events
Applications(continued)

 Medicine: disease outcome, effectiveness of treatments


 analyze patient disease history: find relationship between
diseases

 Website/store design and promotion


 find affinity of visitor to pages and modify layout
Attribute
Types of Attributes
Data Mining Tasks
 Predictive tasks : Predict the value of a particular
attribute based on the values of other attributes.

 Descriptive tasks : Here, the objective is to derive


patterns (clusters and anomalies) that summarize the
underlying relationships in data.
Examples of Classification
Association Analysis
Cluster Analysis
Anomaly Detection
 The task of identifying observations whose characteristics
are significantly different from rest of the data. Such
observations are called anomalies or outliers.

 Ex: Credit card fraud detection, network intrusions,


unusual patterns of disease.
Machine Learning

 Machine Learning is the science of programming computers


so they can learn from data.

 Machine Learning is the field of study that gives computers


the ability to learn without being explicitly programmed.

 A computer program is said to learn from experience E with


respect to some task T and some performance measure P, if
its performance on T, as measured by P, improves with
experience E.
Example
 Spam filter is a Machine Learning program that can learn to
flag spam given examples of spam emails (e.g., flagged by
users) and examples of regular (nonspam, also called “ham”)
emails.
 The examples that the system uses to learn are called the
training set. Each training example is called a training instance
(or sample).
 In this case, the task T is to flag spam for new emails, the
experience E is the training data, and the performance measure
P needs to be defined; for example, you can use the ratio of
correctly classified emails.
 This particular performance measure is called accuracy and it is
often used in classification tasks.
Major Tasks in Data Preprocessing
 Data cleaning
 Fill in missing values, smooth noisy data, identify or remove
outliers, and resolve inconsistencies
 Data integration
 Integration of multiple databases, data cubes, or files

 Data reduction
 Dimensionality reduction
 Data compression

 Data transformation
 Normalization
Forms of Data Preprocessing

20
Data Cleaning
 Data in the Real World Is Dirty: Lots of potentially
incorrect data, e.g., instrument faulty, human or
computer error, transmission error
 incomplete: lacking attribute values, lacking certain
attributes of interest
 e.g., Occupation=“ ” (missing data)

 noisy: containing noise, errors, or outliers


 e.g., Salary=“−10” or Salary=“NaN” (an error)
21
Normalization
 Min-max normalization: to [new_minA, new_maxA]
v  minA
v'  (new _ maxA  new _ minA)  new _ minA
maxA  minA
 Ex. Let income range 12,000 to 98,000 normalized to [0.0, 1.0]. Then
73,600  12,000
73,600 is mapped to (1.0  0)  0  0.716
98,000  12,000
 Z-score normalization (μ: mean, σ: standard deviation):
v  A
v' 
 A

73,600  54,000
 1.225
 Ex. Let μ = 54,000, σ = 16,000. Then 16,000
 Normalization by decimal scaling
v
v'  j Where j is the smallest integer such that Max(|ν’|) < 1
10
22
The Traditional Approach
Use of Machine Learning
Automatic Adaptation
Machine Learning helps Humans
Learn
Types of Machine Learning Algorithms
Supervised learning

Examples: K-Nearest Neighbors, Linear Regression, Logistic Regression,


Support Vector Machines (SVMs) , Decision Trees and Random Forests
Classification
Regression
Unsupervised learning
Data is unlabeled

Examples: Clustering -- k-Means, Hierarchical Cluster Analysis (HCA),


Visualization and dimensionality reduction -- Principal Component Analysis
(PCA), Anomaly detection, Association rule learning -- Apriori
Reinforcement Learning

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy