Module 3 - Introduction to ML
Module 3 - Introduction to ML
Learning by self
WHAT IS MACHINE LEARNING?
WHAT IS MACHINE LEARNING?
TYPES OF MACHINE LEARNING
Supervised learning – also called
predictive learning
Reinforcement learning
MACHINE LEARNING PROCESS
What was the most difficult subject in the last
semester?
Minimum Q1 Q3 Maximum
Median (Q2)
DATA EXPLORATION – BOX PLOT
DATA EXPLORATION – BOX PLOT
DATA QUALITY
In
different countries, rules and regulations,
cultural background, emotional maturity of
people are drastically different
Supervised
Classification – KNN, Naive Bayes, Decision Tree, etc.
Unsupervised
Clustering – K-Means
Market Basket Analysis
SUPERVISED LEARNING - CLASSIFICATION
Test Data
Intel
SUPERVISED LEARNING - REGRESSION
y = α + βx
UNSUPERVISED LEARNING
Unlabelled Data
Cluster 2
Cluster 1
Cluster 3
Cluster 4
UNSUPERVISED LEARNING – MARKET BASKET
ANALYSIS
SELECTING A MODEL
Input
Data Trained Model
Test
20% - 30% Data
Model Performance
K-FOLD CROSS-VALIDATION– OVERALL APPROACH
K-FOLD CROSS-VALIDATION– DETAILED APPROACH
K-FOLD CROSS-VALIDATION (CONTD.)
BOOTSTRAP SAMPLING / BOOTSTRAPPING
TRAIN A MODEL – UNDER VS. OVER FIT
matches actual
outcome. Hence, they
False Negative (FN) True Negative (TN)
are correct
classifications.
EVALUATING A MODEL – CLASSIFICATION (CONTD.)
Actual
Actual Outcome
Actual Win Loss
Win Loss
Predicted Win 85 4
Predicted Loss 2 9
Win
Predicted Outcome
P(pr) = proportion of expected agreement between actual and predicted data both in case
of class of interest as well as the other classes =
Note: Kappa value can be 1 at the maximum, which represents perfect agreement between model’s prediction and actual values.
EVALUATING A MODEL (ROC CURVE)
TPR =
FPR =
Actual value
Error
Predicted value
Internal evaluation
Silhouette width
External evaluation
Purity
EVALUATING A MODEL (CLUSTERING)
Cluster 2
Cluster 1
a(i) Average distance between
ai2 ai1 the ith data instance and all other
data instances belonging to the
b14(1)
same cluster
ain_1 b(i) Lowest average distance
b14(2)
between the i-the data instance and
b14(n4) data instances of all other clusters
Cluster 3
Cluster 4