Unit 3
Unit 3
in
UNIT - III
SUPERVISED LEARNING
What is Supervised learning?
Supervised learning as the name indicates has the presence of
a supervisor as a teacher. Supervised learning is when we teach or train
the machine using data that is well-labelled. Which means some data is
already tagged with the correct answer. After that, the machine is
provided with a new set of examples (data) so that the supervised
learning algorithm analyses the training data (set of training examples)
and produces a correct outcome from labeled data.
• Supervised learning is a type of machine learning where the algorithm
learns from labeled data meaning the data comes with correct answers
Click to Edit or classifications.
Types of Supervised Learning
Supervised learning can be applied to two main types of problems:
• Classification: Where the output is a categorical variable (e.g., spam
vs. non-spam emails, yes vs. no).
Click to Edit
Classification
Classification teaches a machine to sort things into categories. It
learns by looking at examples with labels (like emails marked “spam”
or “not spam”). After learning, it can decide which category new items
belong to, like identifying if a new email is spam or not. For example a
classification model might be trained on dataset of images labeled as
either dogs or cats and it can be used to predict the class of new and
unseen images as dogs or cats based on their features such as color, texture
and shape.
Click to Edit
X-axis: Feature 1 (e.g., Color/Texture)
Y-axis: Feature 2 (e.g., Shape/Size)
These are the input features used by the classification model.
What’s shown:
Colored background regions: These show the decision
boundaries of the classifier — areas where the model assigns a
different class label (e.g., dog or cat).
Colored dots: These are the data points (i.e., examples of dogs
Click to Edit and cats) plotted by their feature values.
Color bar on the right:
Indicates the predicted class labels:
Class 0 (probably Dogs)
Class 1
Class 2 (maybe Cats)
Classification Tasks
1. Classifying emails as either Spam or Not Spam: Predicting whether
an email is spam (class 1) or not spam (class 0). The output variable is
discrete, taking on two distinct values (0 or 1) representing the two
classes.
2. Classifying Images of Fruits: Classifying images of fruits into
categories such as apple (class 0), banana (class 1), or orange (class 2).
The output variable is discrete, with multiple distinct values (0) or 1 or
2) representing the three classes (apple, banana, orange).
3. Sentiment Analysis: Analyzing text data to determine the sentiment of
Click to Edit a review (positive, negative, neutral). The sentiment labels (positive,
negative, neutral) are discrete categories assigned to the input text.
4. Medical Diagnosis: Predicting the presence of a disease based on
patient symptoms and test results. The diagnosis categories (e.g.,
disease present, no disease) are discrete labels assigned to the patient
data
5. Customer Retention: Predicting whether a customer will renew a
subscription or not (churn prediction).
Importance of Classification in ML
1. Solves Real World Decision Problems -Classification helps make automated
decisions based on historical data.
Examples:
Spam detection (email → spam or not)
Disease diagnosis (symptoms → disease type)
Loan approval (applicant data → approve/reject).
5. Drives Business Intelligence-By grouping data into categories, companies can: Segment customers
Target marketing efforts Personalize recommendations
Example: Netflix classifying users’ viewing habits to recommend shows.
6. Crucial in Healthcare & Safety-In critical fields like medicine, security, and autonomous vehicles,
classification helps:
Classify medical images (e.g., cancer detection)
Detect anomalies (e.g., cyberattacks)
Click to Edit Identify objects on the road (e.g., pedestrian or stop sign).
7. Supports Automation & AI-Classification models power intelligent systems like: Chatbots understanding
intent
(question, complaint, etc.)
Smart assistants (e.g., classifying speech commands)Self-driving cars.
8. Versatility Across Domains- Classification is used in:
Finance: credit scoring
E-commerce: product recommendation
HR: resume screening
Agriculture: classifying crop diseases
Classification Algorithms in Machine Learning
These classification algorithms offer a diverse set of tools for solving a
wide range of classification tasks in machine learning. Each algorithm
has its strengths and is suitable for different types of data and problem
domains. Experimenting with these algorithms and understanding their
characteristics can help in selecting the most appropriate approach for a
given classification problem.
1. Logistic Regression:
Description: Logistic regression is a linear classification algorithm used
for binary classification tasks. It estimates the probability that a given
Click to Edit input belongs to a particular class.
Advantages: Simple, interpretable, works well for linearly separable data.
Application: Spam detection, customer churn prediction.
2. Support Vector Machines (SVM):
A Support Vector Machine (SVM) is a powerful supervised machine
learning algorithm used for classification, regression, and even outlier
detection. It works especially well for binary classification problems and
when the data is high-dimensional (lots of features).
Description: SVM is a versatile classification algorithm that finds the optimal
hyperplane to separate classes in the feature space. It can handle linear and non-linear
classification tasks.
Advantages: Effective in high-dimensional spaces, works well with clear margin of
separation.
Application: Text categorization, image recognition.
Click to Edit
Youtube link- https://www.youtube.com/watch?v=Y6RRHw9uN9o&t=135s
3. Decision Trees:
Description: Decision trees are non-linear classifiers that recursively split the data
based on feature values to make predictions. They create a tree-like structure of
decisions.
Advantages: Easy to interpret, can handle both numerical and categorical data.
Application: Customer segmentation, medical diagnosis.
4. Random Forest:
Description: Random Forest is an ensemble method that consists of multiple decision
trees. It improves prediction accuracy and reduces overfitting by aggregating the predictions of
individual trees.
Advantages: Robust to overfitting, handles high-dimensional data well.
Application: Credit risk analysis, image classification.
6. Neural Networks:
Description: Neural networks are deep learning classifiers that consist of interconnected
layers of nodes. They learn complex patterns in the data through training with
backpropagation.
Advantages: Capable of learning intricate patterns, suitable for large datasets.
Application: Image recognition, speech recognition.
Performance Evaluation Metrics in Classification
A confusion matrix is a simple table that shows how well a classification model is
performing by comparing its predictions to the actual results. It breaks down the
predictions into four categories: correct predictions for both classes (true
positives and true negatives) and incorrect predictions (false positives and false
negatives). This helps you understand where the model is making mistakes, so you
can improve it.
The matrix displays the number of instances produced by the model on the test
data.
Click to Edit
• True Positive (TP): The model correctly predicted a positive outcome (the
actual outcome was positive).
• True Negative (TN): The model correctly predicted a negative outcome (the
actual outcome was negative).
• False Positive (FP): The model incorrectly predicted a positive outcome (the
actual outcome was negative). Also known as a Type I error.
• False Negative (FN): The model incorrectly predicted a negative outcome (the
actual outcome was positive). Also known as a Type II error.
1. Accuracy
2. Precision
Click to Edit
Precision focuses on the quality of the model’s positive
predictions. It tells us how many of the instances predicted as
positive are actually positive. Precision is important in
situations where false positives need to be minimized
3. Recall
4. F1-Score
Click to Edit It provides a better sense of a model’s overall performance,
particularly for imbalanced datasets. The F1 score is helpful
when both false positives and false negatives are important.
Example
Click to Edit
Click to Edit
Regression
4. Logistic Regression
Click to Edit • What it is:
Despite the name, this is actually used for classification, not for predicting
continuous values. It predicts categories like yes/no or pass/fail.
• Use Case:
Predicting whether a student will pass or fail based on their attendance and
study habits.
• Example:
"If a student attends classes regularly and studies well, they're likely to pass."
Regression algorithms in Machine Learning
1. Linear Regression
• What it does: Predicts a continuous value using one or
more input features assuming a straight-line relationship.
• Applications:
• Predicting house prices
• Estimating sales revenue
Click to Edit • Forecasting temperature
• Advantages:
• Simple to implement and interpret
• Fast training and prediction
• Good for linearly correlated data
2. Ridge & Lasso Regression (Regularized Linear Models)
What they do: Improve linear regression by reducing overfitting
using penalties.
Applications:
Financial forecasting (e.g., stock prices)
Medical predictions (e.g., disease progression)
Advantages:
Prevents overfitting
Click to Edit
Works well when features are correlated
Lasso helps with feature selection.
3. Polynomial Regression
What it does:
Polynomial regression is a type of regression analysis used in
statistics and machine learning when the relationship between
the independent variable (input) and the dependent variable
(output) is not linear.
Fits curved relationships by using powers of input features.
Applications:
Click to Edit Modeling growth trends (e.g., population, sales)
Physics-related predictions (e.g., motion trajectories)
Advantages:
Can capture more complex, non-linear patterns
More flexible than linear regression
4. Support Vector Regression (SVR)
What it does:
Uses the principles of Support Vector Machines to predict continuous values.
Applications:
Predicting stock market trends
Energy demand forecasting
Advantages:
Works well with non-linear and high-dimensional data
Robust to outliers.
5. Decision Tree Regression
What it does:
Click to Edit Splits data into tree-like structures to make predictions.
Applications:
Predicting student performance
Customer churn prediction
Advantages:
Easy to visualize and understand
Handles both numerical and categorical data
No need for feature scaling
6. Random Forest Regression
What it does:
Uses multiple decision trees (an ensemble) to improve accuracy
and reduce overfitting.
Applications:
Predicting real estate values
Medical risk assessment
Advantages:
Click to Edit
High accuracy and robustness
Handles missing values well
Reduces overfitting compared to a single tree🔹
7. Gradient Boosting (e.g., XGBoost, LightGBM)
What it does:
Builds trees one by one, where each new tree fixes errors made
by the previous ones.
Applications:
Credit scoring in finance
Click-through rate prediction in advertising
Click to Edit Advantages:
Excellent performance on structured data
Highly customizable and powerful
Often used in data science competitions
Importance and benefits of Regression in Machine Learning
• The AWS dataset is publicly available to users to download and access it. The AWS
is known for cloud-based access to facilitate research, analysis, and experimentation.
Dataset link - Amazon-Products DataSet
4. Google Dataset Search
• The Google dataset is primarily used in every institution, universities and
organizations.
Dataset link - RTA Dataset
Click to Edit
5. Azure Open Datasets
• The Azure is known for cloud based platform and it is hosted by the company
Microsoft. The datasets present is azure platform used by various domains such as
Finance, Healthcare, Environmental Science and more.
Click to Edit
Click to Edit
How to choose the value of K
Click to Edit
Click to Edit
Click to Edit
Now, we need to classify new data point with black dot (at
point 60,60) into blue or red class. We are assuming K = 3 i.e. it
would find three nearest data points. It is shown in the next
diagram −
Click to Edit
How to Calculate Euclidean Distance?
Click to Edit
The formula to calculate Euclidean distance is: This is the most commonly used
distance measure, and it is
limited to real-valued vectors.
Using this formula, it measures a
straight line between the query
point and the other point being
measured.
• For Two Points(2D)- The Euclidean distance formula
for calculating the distance between two points in a
two-dimensional space (2D) is given by:
d=
Click to Edit
Click to Edit
Click to Edit
Click to Edit
Linear Models
• Linear models in machine learning represent relationships as linear
combinations of input features. Their fundamental equation takes the
form:
y = β₀ + β₁x₁ + β₂x₂ + ... + βₙxₙ + ε
Where:
1300 240
1500 320
1700 330
1830 295
2350 409
1450 319
In machine learning, linear regression uses a linear equation to model the relationship between a
dependent variable (Y) and one or more independent variables (X).
The main goal of the linear regression model is to find the best-fitting straight line (often called a
regression line) through a set of data points.
Line of Regression
• A straight line that shows a relation between the dependent
variable and independent variables is known as the line of
regression or regression line.
Click to Edit
Furthermore, the linear relationship can be positive or negative
in nature as explained below −
1. Positive Linear Relationship
• A linear relationship will be called positive if both
independent and dependent variable increases. It can be
understood with the help of the following graph −
Click to Edit
2. Negative Linear Relationship
• A linear relationship will be called negative if the
independent increases and the dependent variable decreases.
It can be understood with the help of the following graph −
Click to Edit
How Linear Regression works?
1. Model Representation
This is how the model represents the relationship between input features and the
output.
Formula:
y = β₀ + β₁x₁ + β₂x₂ + ... + βₙxₙ + ε
Where:
The goal is to use linear regression to predict the price of a car that is 6
years old and has 80,000 km mileage.
Click to Edit
Step 1: Define the Linear Regression Model
y = β₀ + β₁x₁ + β₂x₂ + ... + βₙxₙ + ε
Where:
• 𝛽1,2,…,𝛽
𝑛 𝑛 are coefficients(weights) that represent how
much each predictor variable contributes to 𝑦.
Click to Edit
Click to Edit
Click to Edit
A negative intercept suggests that when study hours = 0, the
probability of passing is very low. If the intercept was positive, it
Click to Edit would imply that even without studying, a student has a high
chance of passing— which may not be realistic.
Step 4: Interpretation
• The more hours studied, the higher the probability of passing.
• The decision boundary (typically at 0.5) separates passing
and failing students.
import numpy as np # Define independent variable (X) and dependent
variable (y)
import pandas as pd from X = df[["Hours_Studied"]]
sklearn.model_selection import y = df["Pass"]
train_test_split from # Split dataset into training and testing sets
sklearn.linear_model X_train, X_test, y_train, y_test =
import LogisticRegression from train_test_split(X, y, test_size=0.2,
random_state=42)
sklearn.metrics
# Train logistic regression modelmodel =
import accuracy_score LogisticRegression()model.fit(X_train, y_train)
# Predict outcomes on test datay_pred =
# Sample dataset: Hours studied model.predict(X_test)
vs Pass/Fail # Evaluate the model's accuracyaccuracy =
Click to Edit outcomedata = accuracy_score(y_test, y_pred)print(f"Model
Accuracy: {accuracy * 100:.2f}%")
{ "Hours_Studied": [1, 3, 5, 7, 9,
# Predict pass/fail for a student who studied 6
2, 4, 6, 8, 10], "Pass": [0, 0, 1, 1, hours
1, 0, 0, 1, 1, 1] # 1 = Pass, 0 = new_student = np.array([[6]])
Fail} # Hours Studied = 6
# Convert dictionary to prediction =
model.predict(new_student.reshape(1, -
DataFramedf = 1))print("Will the student pass?", "Yes" if
pd.DataFrame(data) prediction[0] == 1 else "No")
Multiclass classification using multinomial logistic
regression
Multinomial Logistic Regression is an extension of logistic regression
used for classifying data into more than two categories. Example use case:
Predicting the type of flower (Setosa, Versicolor, Virginica) using features
like petal length, sepal width, etc.This is a multiclass classification task.
Key Idea
• Binary logistic regression uses the sigmoid function to predict
probabilities of two classes.
Click to Edit • Multinomial logistic regression uses the softmax function to predict
the probabilities for three or more classes.
• The model gives a probability for each class, and the class with the
highest probability is chosen.
Example
• Let’s say we want to predict fruit type based on weight and
color score:
Click to Edit
Click to Edit
Click to Edit
Naive Bayes Classifier
The Naive Bayes Classifier is a simple yet powerful machine learning
algorithm used for classification tasks. It’s based on Bayes’ Theorem,
with a key assumption: all input features are independent given the
class (which is rarely true in real life — hence the name “naive”).
Few examples of Naive Bayes Algorithm
1. Spam Email Detection
• Hypothesis: Email is spam
• Evidence: Words like "free", "win", "money" appear
Click to Edit • Naive Bayes uses the presence or absence of words to classify emails.
2. Sentiment Analysis
• Hypothesis: A review is positive or negative
• Evidence: Words in the review like "great", "bad", "amazing"
• Used in movie reviews, product feedback, etc.
3. Medical Diagnosis
Hypothesis: Patient has a particular disease
Evidence: Symptoms like fever, cough, headache
Naive Bayes can estimate the likelihood of diseases based on
symptom patterns.
4. News Article Classification
Hypothesis: An article belongs to the “sports” or “politics”
category
Click to Edit
Evidence: Words appearing in the article (e.g., "match",
"parliament")
Helps categorize large volumes of news automatically.
Types of Naive Bayes Classifier
1. Gaussian Naive Bayes
• Use When: Features are continuous and follow a normal (Gaussian)
distribution.
• Assumes: Data in each class is normally distributed.
• Formula: Uses the Gaussian probability density function to estimate
likelihoods.
Example: Predicting whether a person has a disease based on height, weight, or age.
2. Multinomial Naive Bayes
Click to Edit
• Use When: Features are discrete, like word counts or frequency.
• Assumes: Data is generated from a multinomial distribution (bag-of-words
model).
• Most commonly used for text classification problems.
Example: Spam detection, where features are counts of words in an email.
3. Bernoulli Naive Bayes
• Use When: Features are binary (i.e., 0 or 1), representing
presence or absence of a feature.
• Assumes: Each feature follows a Bernoulli distribution
(yes/no).
• Also used for text classification, but considers only whether a
word appears or not.
Example: Classifying documents based on whether specific
keywords exist in them.
Click to Edit
Applications of Naive Bayes Classifier
1. Spam Detection Classifies emails as spam or not spam
based on words, subject lines, and patterns.
Fast and works well even with thousands of emails.
2. Sentiment Analysis Determines if a product or movie
review is positive, negative, or neutral.
Common in customer feedback and social media monitoring.
Click to Edit
3. Medical Diagnosis Predicts diseases based on symptoms.
Example: Given symptoms like fever, cough, and fatigue,
predict the likelihood of flu vs. COVID.
4. Document Classification
Automatically assigns categories to news articles, blogs, or research
papers.
Example: Labeling a document as “sports”, “politics”, or “technology”.
5. Recommendation Systems
Predicts user preferences based on past behavior.
Example: Suggesting products on Amazon or videos on YouTube.
6. Real-time Prediction Systems
Click to Edit Because Naive Bayes is very fast and lightweight, it’s used in:Fraud
detection in banking
Real-time ad targeting in digital marketing Risk classification in
insurance
7. Face or Object Recognition
Helps in classifying images using presence or absence of certain
features (used with other techniques).
Working Principle of Naïve Bayes Theorem
Click to Edit
Let’s consider a decision tree for predicting whether a customer will buy
a product based on age, income and previous purchases: Here’s how
the decision tree works:
1. Root Node (Income)
First Question: “Is the person’s income greater than $50,000?”
• If Yes, proceed to the next question.
Click to Edit
Splitting of Decision Tree
Click to Edit
Click to Edit
•∣Sv|= number of elements in the subset where attribute has value v
•∣S∣= total number of elements in the full dataset
3. Compute the information gain using the formula:
Click to Edit
Click to Edit
Click to Edit
Click to Edit
Click to Edit
Click to Edit
Click to Edit
Click to Edit
Gini Index
The Gini Index is a proportion of the impurity or inequality of a circulation, regularly
utilized as an impurity measure in decision tree algorithms. With regards to decision
trees, the Gini Index is utilized to determine the best feature to split the data on at every
node of the tree.
The formula for Gini Index is as per the following:
Click to Edit
Gini index can be calculated using the below formula:
Gini Index vs Entropy
While entropy and the Gini Index are both normally utilized as impurity
measures in decision tree algorithms, they have various properties.
Entropy is more delicate to the circulation of class names and will in
general deliver more adjusted trees, while the Gini Index is less touchy
to the appropriation of class marks and will in general create more
limited trees with less splits. The decision of impurity measure relies
upon the particular issue and the attributes of the data.
Click to Edit
Example of Gini index
Click to Edit
Click to Edit
THANK YOU
Click to Edit