0% found this document useful (0 votes)

314 views

FDP Day1

This document provides an introduction to data mining and machine learning. It discusses how data mining is used to extract useful patterns from large datasets. Common data mining tasks include prediction, classification, clustering, and anomaly detection. It also introduces machine learning as a field that allows computers to learn from data without being explicitly programmed. Major topics in data preprocessing like data cleaning, normalization, and dimensionality reduction are covered. Finally, it distinguishes between supervised, unsupervised, and reinforcement learning algorithms.

Uploaded by

yadavsticky5108

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

314 views

FDP Day1

Uploaded by

yadavsticky5108

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 35

Fundamentals of

Data Mining
and
Machine Learning

Dr.B.Santhosh Kumar,
Associate Professor,
G. Pulla Reddy Engineering College(Autonomous),
Kurnool.
Introduction
What is Data Mining?

 The significant extraction of implicit, previously unknown and

potentially useful information from data.

 Data mining is the process of automatically discovering useful

information in large data repositories
Applications
 Banking: loan/credit card approval
 predict good customers based on old customers

 Customer relationship management

 identify those who are likely to leave for a competitor.

 Targeted marketing
 identify likely responders to promotions

 Fraud detection: telecommunications, financial transactions

 from an online stream of event identify fraudulent events
Applications(continued)

 Medicine: disease outcome, effectiveness of treatments

 analyze patient disease history: find relationship between
diseases

 Website/store design and promotion

 find affinity of visitor to pages and modify layout
Attribute
Types of Attributes
Data Mining Tasks
 Predictive tasks : Predict the value of a particular
attribute based on the values of other attributes.

 Descriptive tasks : Here, the objective is to derive

patterns (clusters and anomalies) that summarize the
underlying relationships in data.
Examples of Classification
Association Analysis
Cluster Analysis
Anomaly Detection
 The task of identifying observations whose characteristics
are significantly different from rest of the data. Such
observations are called anomalies or outliers.

 Ex: Credit card fraud detection, network intrusions,

unusual patterns of disease.
Machine Learning

 Machine Learning is the science of programming computers

so they can learn from data.

 Machine Learning is the field of study that gives computers

the ability to learn without being explicitly programmed.

 A computer program is said to learn from experience E with

respect to some task T and some performance measure P, if
its performance on T, as measured by P, improves with
experience E.
Example
 Spam filter is a Machine Learning program that can learn to
flag spam given examples of spam emails (e.g., flagged by
users) and examples of regular (nonspam, also called “ham”)
emails.
 The examples that the system uses to learn are called the
training set. Each training example is called a training instance
(or sample).
 In this case, the task T is to flag spam for new emails, the
experience E is the training data, and the performance measure
P needs to be defined; for example, you can use the ratio of
correctly classified emails.
 This particular performance measure is called accuracy and it is
often used in classification tasks.
Major Tasks in Data Preprocessing
 Data cleaning
 Fill in missing values, smooth noisy data, identify or remove
outliers, and resolve inconsistencies
 Data integration
 Integration of multiple databases, data cubes, or files

 Data reduction
 Dimensionality reduction
 Data compression

 Data transformation
 Normalization
Forms of Data Preprocessing

20
Data Cleaning
 Data in the Real World Is Dirty: Lots of potentially
incorrect data, e.g., instrument faulty, human or
computer error, transmission error
 incomplete: lacking attribute values, lacking certain
attributes of interest
 e.g., Occupation=“ ” (missing data)

 noisy: containing noise, errors, or outliers

 e.g., Salary=“−10” or Salary=“NaN” (an error)
21
Normalization
 Min-max normalization: to [new_minA, new_maxA]
v  minA
v'  (new _ maxA  new _ minA)  new _ minA
maxA  minA
 Ex. Let income range 12,000 to 98,000 normalized to [0.0, 1.0]. Then
73,600  12,000
73,600 is mapped to (1.0  0)  0  0.716
98,000  12,000
 Z-score normalization (μ: mean, σ: standard deviation):
v  A
v' 
 A

73,600  54,000
 1.225
 Ex. Let μ = 54,000, σ = 16,000. Then 16,000
 Normalization by decimal scaling
v
v'  j Where j is the smallest integer such that Max(|ν’|) < 1
10
22
The Traditional Approach
Use of Machine Learning
Automatic Adaptation
Machine Learning helps Humans
Learn
Types of Machine Learning Algorithms
Supervised learning

Examples: K-Nearest Neighbors, Linear Regression, Logistic Regression,

Support Vector Machines (SVMs) , Decision Trees and Random Forests
Classification
Regression
Unsupervised learning
Data is unlabeled

Examples: Clustering -- k-Means, Hierarchical Cluster Analysis (HCA),

Visualization and dimensionality reduction -- Principal Component Analysis
(PCA), Anomaly detection, Association rule learning -- Apriori
Reinforcement Learning

Rapid Miner Cheat Doc
67% (6)
Rapid Miner Cheat Doc
14 pages
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Sapthagiri College of Engineering: Department of Information Science and Engineering Big Data Analytics Question Bank
No ratings yet
Sapthagiri College of Engineering: Department of Information Science and Engineering Big Data Analytics Question Bank
3 pages
Big Data Question Bank
No ratings yet
Big Data Question Bank
38 pages
Question Bank of Applied Machine Learning
No ratings yet
Question Bank of Applied Machine Learning
2 pages
Data Warehousing and Data Mining Syllabus
No ratings yet
Data Warehousing and Data Mining Syllabus
1 page
Part B Questions
No ratings yet
Part B Questions
3 pages
Recommender System Syllabus
No ratings yet
Recommender System Syllabus
3 pages
Cp4152 Database Practice Lab Manual R 2021
No ratings yet
Cp4152 Database Practice Lab Manual R 2021
48 pages
Anna University Data Warehousing and Data Mining November December 2011 Question Paper
No ratings yet
Anna University Data Warehousing and Data Mining November December 2011 Question Paper
3 pages
BDA Unit 1-1
No ratings yet
BDA Unit 1-1
21 pages
CS3492 Database Management Systems Question Bank 1
No ratings yet
CS3492 Database Management Systems Question Bank 1
11 pages
AI QB For All 5 Units - 2 Marks
No ratings yet
AI QB For All 5 Units - 2 Marks
28 pages
MC4102 OOSE Question bank
No ratings yet
MC4102 OOSE Question bank
4 pages
DBMS Unit 3
No ratings yet
DBMS Unit 3
98 pages
Unit-5 Unit-5: Case Studies of Big Data Analytics Using Map-Reduce Programming
No ratings yet
Unit-5 Unit-5: Case Studies of Big Data Analytics Using Map-Reduce Programming
11 pages
Ccs375 Web Technologies Syllabus
No ratings yet
Ccs375 Web Technologies Syllabus
3 pages
Bda Super Imp
No ratings yet
Bda Super Imp
35 pages
DBDM Unit-3
No ratings yet
DBDM Unit-3
30 pages
DEVOPS QUESTION PAPERS 26-11-2024 AI&DS
No ratings yet
DEVOPS QUESTION PAPERS 26-11-2024 AI&DS
4 pages
CCA3002 - FOG-AND-EDGE-COMPUTING - LT - 1.0 - 34 - Fog and Edge Computing
No ratings yet
CCA3002 - FOG-AND-EDGE-COMPUTING - LT - 1.0 - 34 - Fog and Edge Computing
3 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
4 pages
Unit 4 HIVE - PIG
No ratings yet
Unit 4 HIVE - PIG
71 pages
Hbase PPT PDF
No ratings yet
Hbase PPT PDF
100 pages
DAN Lab ManuaL
No ratings yet
DAN Lab ManuaL
53 pages
Data Mining Question Bank U3 & U4
No ratings yet
Data Mining Question Bank U3 & U4
3 pages
IT6702-Data Warehousing and Data Mining
0% (1)
IT6702-Data Warehousing and Data Mining
12 pages
DAA-2020-21 Final Updated Course File
No ratings yet
DAA-2020-21 Final Updated Course File
49 pages
1-Introduction To Dynamic Web Content
No ratings yet
1-Introduction To Dynamic Web Content
34 pages
Django Ppts
No ratings yet
Django Ppts
243 pages
AD3491 - Unit 1 - Introduction to Data Science Important Questions 2 Marks With Answer --3-8
No ratings yet
AD3491 - Unit 1 - Introduction to Data Science Important Questions 2 Marks With Answer --3-8
6 pages
Advanced Java Unit 3 Digital Notes
100% (1)
Advanced Java Unit 3 Digital Notes
67 pages
AI-Practice Questions 3
No ratings yet
AI-Practice Questions 3
2 pages
Data Mining Metrices
No ratings yet
Data Mining Metrices
6 pages
Ad3381 - Data Base Design and Management Manual
No ratings yet
Ad3381 - Data Base Design and Management Manual
56 pages
SCT - QB - Anwers - p1
No ratings yet
SCT - QB - Anwers - p1
53 pages
B. SC Computer Science
100% (1)
B. SC Computer Science
5 pages
Software Construction Lecture 1
No ratings yet
Software Construction Lecture 1
30 pages
CS3491 AI Question Bank
100% (1)
CS3491 AI Question Bank
4 pages
KCG College of Technology Karapakkam Chennai-600 097
No ratings yet
KCG College of Technology Karapakkam Chennai-600 097
3 pages
r05321204 Data Warehousing and Data Mining
No ratings yet
r05321204 Data Warehousing and Data Mining
5 pages
SOLUTIONS That I Can Copy and PASTE Krypton - Fhda.edu - Mmurperfefhy - Cnet-53f - Resources - ISM Book Exercise Solutions
No ratings yet
SOLUTIONS That I Can Copy and PASTE Krypton - Fhda.edu - Mmurperfefhy - Cnet-53f - Resources - ISM Book Exercise Solutions
32 pages
ASSIGNMENT 1 Questions BI
No ratings yet
ASSIGNMENT 1 Questions BI
1 page
ADBMS Lab Manual
No ratings yet
ADBMS Lab Manual
33 pages
CS8091 Bigdata Analytics Lessonplan With Date
No ratings yet
CS8091 Bigdata Analytics Lessonplan With Date
11 pages
SPM 3-I Couse File Format
No ratings yet
SPM 3-I Couse File Format
18 pages
SKP Engineering College: A Course Material On
No ratings yet
SKP Engineering College: A Course Material On
212 pages
Ai-Unit2 - QB-VDP
No ratings yet
Ai-Unit2 - QB-VDP
13 pages
Question Bank - OS
No ratings yet
Question Bank - OS
6 pages
Da Unit-2
No ratings yet
Da Unit-2
23 pages
Unit-3 DWDM
No ratings yet
Unit-3 DWDM
11 pages
Data Analytics Lab File Rohit
No ratings yet
Data Analytics Lab File Rohit
23 pages
Lesson Plan For GE3151
No ratings yet
Lesson Plan For GE3151
5 pages
System Models Abstract Descriptions of Systems Whose Requirements Are Being Analysed
No ratings yet
System Models Abstract Descriptions of Systems Whose Requirements Are Being Analysed
37 pages
WT Lab Manual
No ratings yet
WT Lab Manual
41 pages
Dbms Unit 1 Notes
0% (1)
Dbms Unit 1 Notes
14 pages
Question Bank_CSE-DS
No ratings yet
Question Bank_CSE-DS
5 pages
Elmasri and Navathe DBMS Concepts 25
No ratings yet
Elmasri and Navathe DBMS Concepts 25
10 pages
Instant Ebooks Textbook Cognitive Computing Theory and Applications 1st Edition Venkat N. Gudivada Download All Chapters
100% (6)
Instant Ebooks Textbook Cognitive Computing Theory and Applications 1st Edition Venkat N. Gudivada Download All Chapters
84 pages
ML Lect1
100% (1)
ML Lect1
51 pages
TIS - Intro To Machine Learning
No ratings yet
TIS - Intro To Machine Learning
18 pages
Fast Fourier Transform: XK Xne K N
No ratings yet
Fast Fourier Transform: XK Xne K N
44 pages
Discrete-Time Fourier Transform: XN X Xne
No ratings yet
Discrete-Time Fourier Transform: XN X Xne
36 pages
FIR Digital Filters
No ratings yet
FIR Digital Filters
72 pages
179X1A0249 ML B Sec
No ratings yet
179X1A0249 ML B Sec
2 pages
AQAR Autonomous Data Template April 2020
No ratings yet
AQAR Autonomous Data Template April 2020
105 pages
Discrete Fourier Series (DFS) and Discrete Fourier Transform (DFT)
No ratings yet
Discrete Fourier Series (DFS) and Discrete Fourier Transform (DFT)
49 pages
Differential and Integral Calculus - N Piskunov
80% (10)
Differential and Integral Calculus - N Piskunov
896 pages
List of Workshops/Conferences/Seminars/Webinars/Fdps: Name of The Faculty: A.Pradeep Kumar Yadav
No ratings yet
List of Workshops/Conferences/Seminars/Webinars/Fdps: Name of The Faculty: A.Pradeep Kumar Yadav
3 pages
169x1a0253 ML
No ratings yet
169x1a0253 ML
5 pages
Aqar Val
No ratings yet
Aqar Val
1 page
Abstract
No ratings yet
Abstract
1 page
IARE Brochure 2019
No ratings yet
IARE Brochure 2019
64 pages
Academic Calendar: 02.12.20t3 T o 07 .12.2013 (06 Days)
No ratings yet
Academic Calendar: 02.12.20t3 T o 07 .12.2013 (06 Days)
1 page
Vol. 1 - PG.1
No ratings yet
Vol. 1 - PG.1
175 pages
2 CXC
No ratings yet
2 CXC
1 page
Scenarios: Mihir Jethwa 13 OCTOBER 2021
No ratings yet
Scenarios: Mihir Jethwa 13 OCTOBER 2021
6 pages
Machine Learning Unit Wise Important Questions
100% (2)
Machine Learning Unit Wise Important Questions
2 pages
(Ebook) Introduction to Computational Neurobiology and Clustering (Series on Advances in Mathematics for Applied Sciences) by Brunello Tirozzi, Daniela Bianchi, Enrico Ferraro ISBN 9812705392 - The complete ebook version is now available for download
100% (2)
(Ebook) Introduction to Computational Neurobiology and Clustering (Series on Advances in Mathematics for Applied Sciences) by Brunello Tirozzi, Daniela Bianchi, Enrico Ferraro ISBN 9812705392 - The complete ebook version is now available for download
51 pages
Rani
No ratings yet
Rani
37 pages
Genetic Algorithm and Confusion Matrix For Document Clustering
No ratings yet
Genetic Algorithm and Confusion Matrix For Document Clustering
7 pages
DataMining - Workbook TF
No ratings yet
DataMining - Workbook TF
8 pages
Petroleum data mining presentation
No ratings yet
Petroleum data mining presentation
16 pages
Chapter IV.1 - Report 10
No ratings yet
Chapter IV.1 - Report 10
11 pages
Data Mining Techniques and Methods
No ratings yet
Data Mining Techniques and Methods
11 pages
A.I Seminal
No ratings yet
A.I Seminal
27 pages
Group 4 Advanced Pattern Mining: Vu Manh Cam Nguyen Quy Ky Nguyen Luong Anh Tuan Nguyen Kim Chinh
No ratings yet
Group 4 Advanced Pattern Mining: Vu Manh Cam Nguyen Quy Ky Nguyen Luong Anh Tuan Nguyen Kim Chinh
30 pages
Analytix Labs Data Science Course
100% (1)
Analytix Labs Data Science Course
18 pages
Tripathi2016 PDF
No ratings yet
Tripathi2016 PDF
7 pages
Major Issues in Data Mining
75% (4)
Major Issues in Data Mining
45 pages
Machine Learning With Python
No ratings yet
Machine Learning With Python
487 pages
Chapter 4
No ratings yet
Chapter 4
18 pages
Unit3 Eda
No ratings yet
Unit3 Eda
13 pages
Example of Engineering Literature Review
100% (3)
Example of Engineering Literature Review
6 pages
LP I Assignment A4 Clustering
No ratings yet
LP I Assignment A4 Clustering
13 pages
Data Warehouse MCQS With Answer - Computer Science PDF
100% (2)
Data Warehouse MCQS With Answer - Computer Science PDF
41 pages
Data Mining Lab Manual
33% (3)
Data Mining Lab Manual
44 pages
Internship Document
No ratings yet
Internship Document
41 pages
M techComputerNew
No ratings yet
M techComputerNew
23 pages
Get RapidMiner Data Mining Use Cases and Business Analytics Applications 1st Edition Markus Hofmann free all chapters
100% (1)
Get RapidMiner Data Mining Use Cases and Business Analytics Applications 1st Edition Markus Hofmann free all chapters
58 pages
The Myth of Segmentation or How To Move Beyond
No ratings yet
The Myth of Segmentation or How To Move Beyond
17 pages
CS 3 - Problem Solving Agent
No ratings yet
CS 3 - Problem Solving Agent
80 pages
Web Content Mining Techniques Tools & Algorithms - A Comprehensive Study
No ratings yet
Web Content Mining Techniques Tools & Algorithms - A Comprehensive Study
6 pages
Using Big Data Analytics For Developing Crime Predictive Model
No ratings yet
Using Big Data Analytics For Developing Crime Predictive Model
7 pages
Lecture 17 Clustering
No ratings yet
Lecture 17 Clustering
63 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

FDP Day1

Uploaded by

FDP Day1

Uploaded by

Fundamentals of

 The significant extraction of implicit, previously unknown and

 Data mining is the process of automatically discovering useful

 Customer relationship management

 Fraud detection: telecommunications, financial transactions

 Medicine: disease outcome, effectiveness of treatments

 Website/store design and promotion

 Descriptive tasks : Here, the objective is to derive

 Ex: Credit card fraud detection, network intrusions,

 Machine Learning is the science of programming computers

 Machine Learning is the field of study that gives computers

 A computer program is said to learn from experience E with

 noisy: containing noise, errors, or outliers

Examples: K-Nearest Neighbors, Linear Regression, Logistic Regression,

Examples: Clustering -- k-Means, Hierarchical Cluster Analysis (HCA),

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.