0% found this document useful (0 votes)

8 views

Module 3 - Introduction to ML

Uploaded by

devaadi0713

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Module 3 - Introduction to ML

Uploaded by

devaadi0713

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 45

MACHINE LEARNING

Presenter: Dr. Amit Kumar Das

Professor,
Dept. of Computer Science and Engg.,
Institute of Engineering & Management.
WHAT IS LEARNING?
TYPES OF HUMAN LEARNING

 Learning through direct guidance from

expert – is just one form …

 Learning through indirect guidance

 Learning by self
WHAT IS MACHINE LEARNING?
WHAT IS MACHINE LEARNING?
TYPES OF MACHINE LEARNING
 Supervised learning – also called
predictive learning

 Unsupervised learning – also called

descriptive learning

 Reinforcement learning
MACHINE LEARNING PROCESS
 What was the most difficult subject in the last
semester?

 What if, you had a list of all possible questions

with answers, and a photographic memory?
MACHINE LEARNING PROCESS
 Data Input – Past data or information is
utilized as a basis for future decision-making

 Abstraction – The input data is represented

in a broader way through the underlying
algorithm

 Generalization – The abstracted

representation is generalized to form a
framework for making decisions
TYPICAL ML PROBLEMS
 Prediction of results of a game
 Predicting whether a tumor is malignant or
benign
 Price prediction in domains like real estate,
stocks, etc.
 Demand forecasting in retails
 Customer segmentation
 Self-driven cars
PROBLEMS NOT TO BE CONSIDERED FOR ML

 Bank interest calculation

 Inventory management (except the demand

forecast module)

 Customer on-boarding (except risk prediction

module)

 Tasks in which humans are very effective or

frequent human intervention is needed. For
example, air traffic control
TYPES OF DATA
 Qualitative data (Categorical)
 Student Name, Blood group, Grade, etc.

 Quantitative data (Numerical)

 Temperature, Age, Weight, etc.
DATA EXPLORATION
 Understand the central tendency –
 Mean
 Median
 Mode

 Understand data spread

 Standard Deviation

 Understand data value position

DATA EXPLORATION – CENTRAL TENDENCY

Mean vs. Median for Auto MPG

DATA EXPLORATION – DATA SPREAD
 Consider the data values of two attributes
 Attribute 1 values – 44, 46, 48, 45 and 47
 Attribute 2 values – 34, 46, 59, 39 and 52
 Both the set of values have a mean and
median of 46.
 First set of values is more concentrated or
clustered around the mean / median value
DATA EXPLORATION – DATA VALUE POSITION

 Any data set attribute has five values

 Minimum
 First quartile (Q1)
 Median (Q2)
 Third quartile (Q3), and
 Maximum

Minimum Q1 Q3 Maximum

Median (Q2)
DATA EXPLORATION – BOX PLOT
DATA EXPLORATION – BOX PLOT
DATA QUALITY

 Most occurring data quality issues are:

 Missing values
 Outliers

Missing values of attribute “horsepower” in Auto MPG

REMEDIATE DATA ISSUES
 Remove missing values / outliers – If
number of records are not many, remove them.
 Imputation - Impute the value with mean or
median or mode
 Capping - For values that lie outside the
1.5 X IQR limits, cap them by replacing the
observations below the lower limit with value of
5th percentile and those that lie above the upper
limit, with value of 95th percentile
 Estimate missing values – Assign attribute
values of similar data points in place of the
missing value
ISSUES IN MACHINE LEARNING

 Relatively new and evolving technology

 In
different countries, rules and regulations,
cultural background, emotional maturity of
people are drastically different

 Biggestfears - potential breach of privacy,

discriminatory behaviour, resulting
discontent
WHAT IS MODELLING IN CONTEXT OF
MACHINE LEARNING?
WHAT ARE THE DIFFERENT ML
ALGORITHMS?

 Supervised
 Classification – KNN, Naive Bayes, Decision Tree, etc.

 Regression – Simple Linear Regression, Logistic

Regression

 Unsupervised
 Clustering – K-Means
 Market Basket Analysis
SUPERVISED LEARNING - CLASSIFICATION

Labelled Training Data

Classifier Classification Model

Test Data

Intel
SUPERVISED LEARNING - REGRESSION

y = α + βx
UNSUPERVISED LEARNING

Unlabelled Data

Unsupervised Learning Model

Grouped data / Clusters

UNSUPERVISED LEARNING - CLUSTERING

Cluster 2

Cluster 1

Cluster 3
Cluster 4
UNSUPERVISED LEARNING – MARKET BASKET
ANALYSIS
SELECTING A MODEL

 Predictive models (supervised)

 Predict the value of a category or class
 Problems that can be solved : Prediction of win/loss,
fraudulent transactions, etc.
 Examples : k-Nearest Neighbor (kNN), Naïve Bayes,
Decision Tree, etc.
 Predict numerical values of the target
 Problems that can be solved : Prediction of revenue
growth, rainfall amount, etc,
 Examples: Linear Regression, Logistic Regression, etc.
SELECTING A MODEL
 Descriptive
models
(unsupervised)
 Group together
similar data
instances
 Problems that can be
solved: Customer
grouping or
segmentation based
on social,
demographic, ethnic,
etc. factors
 Most popular model
for clustering is k-
Means
TRAIN A MODEL – HOLDOUT METHOD
70% - 80% Training
Data

Input
Data Trained Model

Test
20% - 30% Data

Model Performance
K-FOLD CROSS-VALIDATION– OVERALL APPROACH
K-FOLD CROSS-VALIDATION– DETAILED APPROACH
K-FOLD CROSS-VALIDATION (CONTD.)
BOOTSTRAP SAMPLING / BOOTSTRAPPING
TRAIN A MODEL – UNDER VS. OVER FIT

Under fit Balanced fit Over fit

TRAIN A MODEL – BIAS VS. VARIANCE
EVALUATING A MODEL - CLASSIFICATION

Actual Outcome  True Positive (TP) –

Win Loss
Predicted win, Actual win
 True Negative (TN) –
Predicted loss, Actual loss
 False Positive (FP) –
Win

Predicted win, Actual loss

Predicted Outcome

True Positive (TP) False Positive (FP)  False Negative (FN) –

Predicted loss, Actual win

 For both TP and TN,

predicted outcome
Loss

matches actual
outcome. Hence, they
False Negative (FN) True Negative (TN)
are correct
classifications.
EVALUATING A MODEL – CLASSIFICATION (CONTD.)
Actual
Actual Outcome
Actual Win Loss
Win Loss
Predicted Win 85 4
Predicted Loss 2 9
Win
Predicted Outcome

True Positive (TP) False Positive (FP)

Loss

False Negative (FN) True Negative (TN)

The percentage of misclassifications are indicated using error rate which is

measured as:

In context of the above confusion matrix,

EVALUATING A MODEL – CLASSIFICATION (CONTD.)
where P(a) = proportion of observed agreement between actual
and predicted in overall data set =

P(pr) = proportion of expected agreement between actual and predicted data both in case
of class of interest as well as the other classes =

Note: Kappa value can be 1 at the maximum, which represents perfect agreement between model’s prediction and actual values.
EVALUATING A MODEL (ROC CURVE)
TPR =

FPR =

Receiver Operating Characteristic curve

EVALUATING A MODEL (REGRESSION)
Value of the apartment unit 

Actual value

Error

Predicted value

Area (in square Feet) 

EVALUATING A MODEL (CLUSTERING)
“Clustering is in the eye of the beholder"

 Internal evaluation
 Silhouette width

 External evaluation
 Purity
EVALUATING A MODEL (CLUSTERING)
Cluster 2

Cluster 1
a(i)  Average distance between
ai2 ai1 the ith data instance and all other
data instances belonging to the
b14(1)
same cluster
ain_1 b(i)  Lowest average distance
b14(2)
between the i-the data instance and
b14(n4) data instances of all other clusters

Cluster 3
Cluster 4

Silhouette width calculation

ENSEMBLE
THANK YOU &
STAY TUNED!

Occupational Therapy
95% (22)
Occupational Therapy
373 pages
574 Sample - Solutions Manual Elementary Linear Algebra 11th Edition by Howard Anton, Chris Rorres
0% (2)
574 Sample - Solutions Manual Elementary Linear Algebra 11th Edition by Howard Anton, Chris Rorres
8 pages
Lakeside Company 12th Edition Trussel Solution Manual
No ratings yet
Lakeside Company 12th Edition Trussel Solution Manual
21 pages
Tafj-As Jbossinstall v6.4 Eap
100% (1)
Tafj-As Jbossinstall v6.4 Eap
33 pages
Machine Learning
No ratings yet
Machine Learning
42 pages
FAM_QUESTION_BANK_CT[1]
No ratings yet
FAM_QUESTION_BANK_CT[1]
14 pages
AIYA SESSION 4
No ratings yet
AIYA SESSION 4
42 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
Machine Learning Intro & Evaluation Metrics
No ratings yet
Machine Learning Intro & Evaluation Metrics
49 pages
Chapter 01 Introduction to ML
No ratings yet
Chapter 01 Introduction to ML
178 pages
Machine - Learning - Unit - 1
No ratings yet
Machine - Learning - Unit - 1
70 pages
Machine Learning in PySpark
No ratings yet
Machine Learning in PySpark
18 pages
1. Machine Learning - Introduction
No ratings yet
1. Machine Learning - Introduction
73 pages
Unit 1 PDF
No ratings yet
Unit 1 PDF
135 pages
ML COMPLETE (pure sem ka)
No ratings yet
ML COMPLETE (pure sem ka)
347 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
24 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
10 pages
ML-chap-2
No ratings yet
ML-chap-2
60 pages
ML Unit1
No ratings yet
ML Unit1
25 pages
Introduction To Machine Learning-1
No ratings yet
Introduction To Machine Learning-1
28 pages
Data Analytics_ML lecturenotes
No ratings yet
Data Analytics_ML lecturenotes
85 pages
Module 2 - ML
No ratings yet
Module 2 - ML
53 pages
Lec-1 Introduction
No ratings yet
Lec-1 Introduction
65 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
11 pages
Introductiontomachinelearning 230723174746 1a0e5edc
No ratings yet
Introductiontomachinelearning 230723174746 1a0e5edc
27 pages
Lect3 Machine Learning
No ratings yet
Lect3 Machine Learning
27 pages
Project
No ratings yet
Project
12 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
31 pages
Chapter 01 Introduction To Machine Learning
No ratings yet
Chapter 01 Introduction To Machine Learning
59 pages
Pattern Recognition Application
No ratings yet
Pattern Recognition Application
43 pages
DSF Unit 3
No ratings yet
DSF Unit 3
29 pages
Machine Learning Models: by Mayuri Bhandari
No ratings yet
Machine Learning Models: by Mayuri Bhandari
48 pages
Machine Learning AND Predictive Modeling: Rabi Kulshi
No ratings yet
Machine Learning AND Predictive Modeling: Rabi Kulshi
24 pages
Machine Learning C
No ratings yet
Machine Learning C
24 pages
Lecture 9
No ratings yet
Lecture 9
27 pages
ML Doc1
No ratings yet
ML Doc1
14 pages
1635838720082
No ratings yet
1635838720082
35 pages
Introduction
No ratings yet
Introduction
41 pages
1. Machine Learning - Introduction
No ratings yet
1. Machine Learning - Introduction
138 pages
Introduction To ML
No ratings yet
Introduction To ML
31 pages
Lecture 4 Machine Learning - Bcsc
No ratings yet
Lecture 4 Machine Learning - Bcsc
45 pages
Unit III - I
No ratings yet
Unit III - I
15 pages
Classification of Machine Learning
No ratings yet
Classification of Machine Learning
73 pages
Module 1 ML
No ratings yet
Module 1 ML
51 pages
Chapter 2
No ratings yet
Chapter 2
35 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
16 pages
AIML Unit 2 Introduction To Machine Learning
No ratings yet
AIML Unit 2 Introduction To Machine Learning
32 pages
Module 1 ML Mumbai University
No ratings yet
Module 1 ML Mumbai University
47 pages
Module2 ch2
No ratings yet
Module2 ch2
36 pages
Classification
No ratings yet
Classification
53 pages
ML Unit 1
No ratings yet
ML Unit 1
9 pages
05 - Machine Learning
No ratings yet
05 - Machine Learning
31 pages
Lecture Notes
No ratings yet
Lecture Notes
86 pages
Machine Learning For Beginners Overview of Algorithm TypesStart Learning Machine Learning From Here
No ratings yet
Machine Learning For Beginners Overview of Algorithm TypesStart Learning Machine Learning From Here
13 pages
Machine Learning
No ratings yet
Machine Learning
74 pages
Machine Learning Unit - 1
No ratings yet
Machine Learning Unit - 1
154 pages
Basics of Machine Learning and Classifications: Dr. Helal Uddin Ahmed
No ratings yet
Basics of Machine Learning and Classifications: Dr. Helal Uddin Ahmed
18 pages
Fundamentals of ML Recap
No ratings yet
Fundamentals of ML Recap
21 pages
Introduction Class
No ratings yet
Introduction Class
134 pages
Chapter4 Machine Learning Part1
No ratings yet
Chapter4 Machine Learning Part1
39 pages
Chapter 02 Overview - 4
No ratings yet
Chapter 02 Overview - 4
43 pages
Module 1
No ratings yet
Module 1
50 pages
Machine Learning Slides
No ratings yet
Machine Learning Slides
46 pages
Tackling Imbalanced Data with Python: Advanced Techniques and Real-World Applications for Tackling Class Imbalance
From Everand
Tackling Imbalanced Data with Python: Advanced Techniques and Real-World Applications for Tackling Class Imbalance
Aarav Joshi
No ratings yet
Hazel Nelson
No ratings yet
Hazel Nelson
5 pages
17 Ecc LJ240
No ratings yet
17 Ecc LJ240
4 pages
T J H S: Book and Lyrics by Music by
100% (3)
T J H S: Book and Lyrics by Music by
34 pages
Format For The Written Analysis of The Case
No ratings yet
Format For The Written Analysis of The Case
1 page
Secretary of Justice Cuevas Vs Bacal Case Digest
No ratings yet
Secretary of Justice Cuevas Vs Bacal Case Digest
2 pages
Antropology- 298 Marks- Siva Reddy Notes Paper 1 @CSGR (1)
No ratings yet
Antropology- 298 Marks- Siva Reddy Notes Paper 1 @CSGR (1)
293 pages
Bring It On Home
No ratings yet
Bring It On Home
5 pages
Music - Unit 1
No ratings yet
Music - Unit 1
6 pages
Mezcal Beneva Case Study
No ratings yet
Mezcal Beneva Case Study
90 pages
Tobacco
No ratings yet
Tobacco
5 pages
Image Enhancement in The Spatial Domain
No ratings yet
Image Enhancement in The Spatial Domain
10 pages
Therapy With Children
0% (1)
Therapy With Children
12 pages
WCRW Erica Chenoweth
100% (1)
WCRW Erica Chenoweth
22 pages
Cecyte 1er Semestre - Ingles I
No ratings yet
Cecyte 1er Semestre - Ingles I
72 pages
An Evaluation and Adaptation of A Chines
No ratings yet
An Evaluation and Adaptation of A Chines
33 pages
Lesson 3: Intended Learning Outcomes
No ratings yet
Lesson 3: Intended Learning Outcomes
6 pages
What Is A Dental Abscess?
No ratings yet
What Is A Dental Abscess?
3 pages
Meaning Is Healthier Than Happiness
No ratings yet
Meaning Is Healthier Than Happiness
3 pages
Martha M. Smith, A/k/a/ Martha M. Saemenes v. Butler County Jail and Attorney General of Kansas, 968 F.2d 21, 10th Cir. (1992)
No ratings yet
Martha M. Smith, A/k/a/ Martha M. Saemenes v. Butler County Jail and Attorney General of Kansas, 968 F.2d 21, 10th Cir. (1992)
2 pages
The Effect of Addiction of Watching Korean Drama Series On Imitation Behavior of Adolescents
No ratings yet
The Effect of Addiction of Watching Korean Drama Series On Imitation Behavior of Adolescents
8 pages
Golden Afternoon Volume Ii Of The Autobiography Of M M Kaye Kaye instant download
100% (1)
Golden Afternoon Volume Ii Of The Autobiography Of M M Kaye Kaye instant download
22 pages
Algorithms:: Inserting at Beginning of The List
No ratings yet
Algorithms:: Inserting at Beginning of The List
4 pages
De Thi Thu ĐH Tiếng anh
No ratings yet
De Thi Thu ĐH Tiếng anh
8 pages
Sentence, Fragment, Subject, and Predicate Notes
No ratings yet
Sentence, Fragment, Subject, and Predicate Notes
16 pages
06 Revelation PPT
No ratings yet
06 Revelation PPT
25 pages
Unit 1-Lesson 1
No ratings yet
Unit 1-Lesson 1
23 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Module 3 - Introduction to ML

Uploaded by

Module 3 - Introduction to ML

Uploaded by

MACHINE LEARNING

Presenter: Dr. Amit Kumar Das

 Learning through direct guidance from

 Learning through indirect guidance

 Unsupervised learning – also called

 What if, you had a list of all possible questions

 Abstraction – The input data is represented

 Generalization – The abstracted

 Bank interest calculation

 Inventory management (except the demand

 Customer on-boarding (except risk prediction

 Tasks in which humans are very effective or

 Quantitative data (Numerical)

 Understand data spread

 Understand data value position

Mean vs. Median for Auto MPG

 Any data set attribute has five values

 Most occurring data quality issues are:

Missing values of attribute “horsepower” in Auto MPG

 Relatively new and evolving technology

 Biggestfears - potential breach of privacy,

 Regression – Simple Linear Regression, Logistic

Labelled Training Data

Classifier Classification Model

Unsupervised Learning Model

Grouped data / Clusters

 Predictive models (supervised)

Under fit Balanced fit Over fit

Under fit Balanced fit Over fit

Actual Outcome  True Positive (TP) –

Predicted win, Actual loss

True Positive (TP) False Positive (FP)  False Negative (FN) –

 For both TP and TN,

True Positive (TP) False Positive (FP)

False Negative (FN) True Negative (TN)

The percentage of misclassifications are indicated using error rate which is

In context of the above confusion matrix,

Receiver Operating Characteristic curve

Area (in square Feet) 

Silhouette width calculation

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.