L02 Classification and Regression
L02 Classification and Regression
L02 Classification and Regression
Regression
1
Objective
• Understand classification and regression tasks
2
What is Classification?
A flower shop wants to
guess a customer's
purchase from similarity to
most recent purchase.
3
What
Which flower is a
is Classification?
customer most likely to
purchase based on
similarity to previous
purchase?
?
4
What
Which flower is a
is Classification?
customer most likely to
purchase based on
similarity to previous
purchase?
?
5
What
Which flower is a
is Classification?
customer most likely to
purchase based on
similarity to previous
purchase?
?
6
What
Which flower is a
is Classification?
customer most likely to
purchase based on
similarity to previous
purchase?
?
7
What is Needed for Classification?
8
4 main types of classification tasks
• Binary Classification
• Multi-Class Classification
• Multi-Label Classification
• Imbalanced Classification
9
Binary Classification
• refers to those classification tasks that have two class
labels
• Examples include:
• Email spam detection (spam or not)
• Churn prediction (churn or not)
• Conversion prediction (buy or not)
• involve one class that is the normal state and another
class that is the abnormal state
• normal state is assigned the class label 0 and the
class with the abnormal state is assigned the class 10
Popular Binary Classification Algorithms
• Logistic Regression
• K-Nearest Neighbors
• Decision Trees
• Support Vector Machine
• Naive Bayes
11
Multi-Class Classification
• Problems
• a model may predict a photo as belonging to one among
thousands or tens of thousands of faces in a face recognition
system
• predicting a sequence of words, such as text translation
models, may also be considered a special type of multi-class
classification. Each word in the sequence of words to be
predicted involves a multi-class classification where the size
of the vocabulary defines the number of possible classes that
may be predicted and could be tens or hundreds of
thousands of words in size
13
Popular Multi-Class Classification Algorithms
• k-Nearest Neighbors
• Decision Trees
• Naive Bayes
• Random Forest
• Gradient Boosting
14
Popular Multi-Class Classification Algorithms
• algorithms that are designed for binary classification can be adapted for
use for multi-class problems using a strategy of fitting multiple binary
classification models for each class vs. all other classes (called one-vs-
rest) or one model for each pair of classes (called one-vs-one).
• One-vs-Rest: Fit one binary classification model for each class vs. all
other classes.
• One-vs-One: Fit one binary classification model for each pair of classes.
• binary classification algorithms that can use these strategies for multi-
class classification include:
• Logistic Regression
• Support Vector Machine
15
Multi-Label Classification
• classification tasks that have two or more class labels, where
one or more class labels may be predicted for each example.
• photo classification, where a given photo may have multiple
objects in the scene and a model may predict the presence
of multiple known objects in the photo, such as “bicycle,”
“apple,” “person,” etc.
16
Multi-Label Classification Algorithms
• classification algorithms used for binary or multi-class classification
cannot be used directly for multi-label classification
• specialized versions of standard classification algorithms can be used,
so-called multi-label versions of the algorithms, including:
• Multi-label Decision Trees
• Multi-label Random Forests
• Multi-label Gradient Boosting
• another approach is to use a separate classification algorithm to
predict the labels for each class
17
Imbalanced Classification
• classification tasks where the number of examples in each class is
unequally distributed
• is binary classification tasks where the majority of examples in the
training dataset belong to the normal class and a minority of examples
belong to the abnormal class
• Examples include:
• Fraud detection
• Outlier detection
• Medical diagnostic tests
18
Imbalanced Classification Algorithms
• Specialized techniques may be used to change the composition of
samples in the training dataset by undersampling the majority class or
oversampling the minority class.
• Examples include:
• Random Undersampling
• SMOTE Oversampling
• Performance metrics may be required as reporting the classification
accuracy may be misleading
• Examples include:
• Precision, Recall and F-Measure
19
Regression Analysis
• consists of a set of machine learning methods that
allow us to predict a continuous outcome variable (y)
based on the value of one or multiple predictor
variables (x)
• goal of regression model is to build a mathematical
equation that defines y as a function of the x
variables. Next, this equation can be used to predict
the outcome (y) on the basis of new values of the
predictor variables (x)
20
Regression Analysis
• used for prediction
• fit a function on the available data and try to predict
the outcome for the future or hold-out datapoints
• 2 main purposes
• estimate missing data within your data range
(Interpolation)
• estimate future data outside your data range
(Extrapolation)
21
Application of Regression Analysis
• real-world examples for regression analysis include
• predicting the price of a house given house features
• predicting the impact of SAT/GRE scores on college
admissions
• predicting the sales based on input parameters
• predicting the weather, etc.
22
Interpolation
Source: https://towardsdatascience.com/a-beginners-
guide-to-regression-analysis-in-machine-learning-
8a828b491bbf
Predictio
n
Inpu
t
23
Extrapolation Source: https://towardsdatascience.com/a-
beginners-guide-to-regression-analysis-in-
machine-learning-8a828b491bbf
24
Regression Algorithms
• Linear Regression
• Polynomial Regression
25
Reference:
• https://machinelearningmastery.com/types-of-
classification-in-machine-learning/
• http://www.sthda.com/english/wiki/regression-
analysis-essentials-for-machine-learning
• https://towardsdatascience.com/a-beginners-guide-to-
regression-analysis-in-machine-learning-
8a828b491bbf
26