0% found this document useful (0 votes)
3 views

22cse61 Module 4

This document covers key concepts in regression, Bayesian learning, and support vector machines (SVM). It explains linear regression, multiple regression, and logistic regression with real-time examples, as well as Bayesian networks and their applications. Additionally, it details the SVM algorithm, its types, kernel functions, and various applications in classification and regression tasks.

Uploaded by

ehejdhjee299
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

22cse61 Module 4

This document covers key concepts in regression, Bayesian learning, and support vector machines (SVM). It explains linear regression, multiple regression, and logistic regression with real-time examples, as well as Bayesian networks and their applications. Additionally, it details the SVM algorithm, its types, kernel functions, and various applications in classification and regression tasks.

Uploaded by

ehejdhjee299
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 110

MODULE 4

Regression, Bayesian Learning and


Support Vector Machine

Regression: Linear Regression, Multiple Regression, Logistic Regression. Bayesian


Learning: Baye’s theorem, Naive Baye’s classifier Bayesian belief network. Support
Vector Machine: Support Vector Machine, Kernel function and Kernel SVM.
Linear Regression
Linear regression is a statistical method that models the relationship between a dependent
variable Y and one independent variable X by fitting a linear equation to observed data.

Used For: Predicting a continuous outcome using one independent variable.

Example:
Suppose we want to predict hours studied (x) and exam score (y):
Hours Studied (x) Exam Score (y)
1 2
1. Find the slope (m)
2 4
2. Find the intercept (c)
3 5
4 4 3. Form the equation: y=mx+c

5 5
Multiple Linear Regression
The goal of Multiple Regression is to predict a dependent variable (e.g., score) using
multiple independent variables (e.g., Hours Studied and Assignments Done).
Used For: Predicting a continuous outcome using multiple variables.
Multiple Linear Regression
Real-Time Example : Multiple Regression (Predicting Salary
Based on Experience and Education Level)
Scenario:
Suppose you are an HR analyst at a company, and you're tasked with predicting the annual salary of employees based
on years of experience and their education level.
You collect data on employees as follows:

Years of Experience (x₁) Education Level (x₂) Salary (y)

1 1 (Bachelor's) 50,000
2 1 (Bachelor's) 55,000
3 2 (Master's) 65,000
4 2 (Master's) 70,000
5 3 (PhD) 90,000
Real-Time Example : Multiple Regression (Predicting Salary
Based on Experience and Education Level)
Real-Time Example : Multiple Regression (Predicting Salary
Based on Experience and Education Level)
Logistic Regression
In Logistic Regression, we predict the probability of a binary outcome, such as whether a
student will pass (1) or fail (0) based on some predictor variables (e.g., hours studied).
Used For: Predicting binary outcomes (YES/NO).
Logistic Regression
Logistic Regression
Summary of the Process
Real-Time Example 2: Logistic Regression (Predicting
Whether a Customer Will Buy a Product Based on Age)

Scenario:
Suppose you're working for an e-commerce company and you're tasked with predicting whether a customer will purchase a
product based on their age.
You have data on customers as follows:

Age (x₁) Purchased (y)


18 0
22 0
25 1
28 1
30 1
Real-Time Example 2: Logistic Regression (Predicting
Whether a Customer Will Buy a Product Based on Age)
Real-Time Example 2: Logistic Regression (Predicting
Whether a Customer Will Buy a Product Based on Age)
Real-Time Example 2: Logistic Regression (Predicting
Whether a Customer Will Buy a Product Based on Age)
Real-Time Example 2: Logistic Regression (Predicting
Whether a Customer Will Buy a Product Based on Age)
Real-Time Example 2: Logistic Regression (Predicting
Whether a Customer Will Buy a Product Based on Age)
Predict the posterior probability that a product is actually defective
given that it failed a quality check, using Bayes’ Theorem. The prior
probability that any product is defective is 0.005. The likelihood, or
the probability that a defective product fails the quality check, is 0.98
(sensitivity). The specificity of the test is 0.97, meaning the
probability that a non-defective product passes is 0.97, so the
probability it fails is 0.03. Use this information to calculate the
posterior probability, incorporating prior, likelihood, and evidence.
Naive Bayes is a probabilistic classification algorithm based on Bayes' Theorem. It assumes that all
features (attributes) are independent of each other — hence the term "naive".
It is widely used in text classification, spam filtering, sentiment analysis, and many other real-world
problems.
NAIVE BAYES CLASSIFIER EXAMPLE 1
NAIVE BAYES CLASSIFIER EXAMPLE 1
NAIVE BAYES CLASSIFIER EXAMPLE 1
NAIVE BAYES CLASSIFIER EXAMPLE 2
NAIVE BAYES CLASSIFIER EXAMPLE 2
BAYESIAN NETWORK
What Is A Bayesian Network?

• A Bayesian Network falls under the category of Probabilistic Graphical


Modelling (PGM) technique that is used to compute uncertainties by using
the concept of probability.
• Popularly known as Belief Networks, Bayesian Networks are used to model
uncertainties by using Directed Acyclic Graphs (DAG).
• "A Bayesian network is a probabilistic graphical model which represents a
set of variables and their conditional dependencies using a directed
acyclic graph."
• Real world applications are probabilistic in nature, and to represent the
relationship between multiple events, we need a Bayesian network.

• It can also be used in various tasks including prediction, anomaly


detection, diagnostics, automated insight, reasoning, time series
prediction, and decision making under uncertainty.
What Is A Directed Acyclic Graph?

• A Directed Acyclic Graph is used to represent a Bayesian Network and like any other
statistical graph, a DAG contains a set of nodes and links, where the links denote the
relationship between the nodes.

• A DAG models the uncertainty of an event occurring based on the Conditional


Probability Distribution (CDP) of each random variable. A Conditional Probability
Table (CPT) is used to represent the CPD of each variable in the network.
• Each node corresponds to the random variables, and a variable can be continuous or discrete.

• Arc or directed arrows represent the causal relationship or conditional probabilities between random variables.
These directed links or arrows connect the pair of nodes in the graph.
These links represent that one node directly influence the other node, and if there is no directed link that means
that nodes are independent with each other
• In the above diagram, A, B, C, and D are random variables represented by the nodes of the network graph. A,b,c,d are
independent event.
• If we are considering node B, which is connected with node A by a directed arrow, then node A is called the parent of
Node B.
• Node C is independent of node A.

• Note: The Bayesian network graph does not contain any cyclic graph. Hence, it is known as a directed acyclic
graph or DAG.
What Is Conditional Probability?

• Conditional Probability of an event X is the probability that the event will


occur given that an event Y has already occurred.
• p(X| Y) is the probability of event X occurring, given that event, Y occurs.
• If X and Y are dependent events then the expression for conditional
probability is given by:
P (X| Y) = P (X and Y) / P (Y)
• Each node in the Bayesian network has condition probability
distribution P(Xi |Parent(Xi) ), which determines the effect of
the parent on that node.

• Bayesian network is based on Joint probability distribution and


conditional probability. So let's first understand the joint
probability distribution:
What Is Joint Probability?
• Joint Probability is a statistical measure of two or more events happening at the same time, i.e., P(A, B, C), The probability of event
A, B and C occurring. It can be represented as the probability of the intersection two or more events occurring

• Joint probability distribution:

• If we have variables x1, x2, x3,....., xn, then the probabilities of a different combination of x1, x2, x3.. xn, are known as Joint
probability distribution.

• P[x1, x2, x3,....., xn], it can be written as the following way in terms of the joint probability distribution.

• = P[x1| x2, x3,....., xn]P[x2, x3,....., xn]= P[x1| x2, x3,....., xn]P[x2|x3,....., xn]....P[xn-1|xn]P[xn].

• In general for each variable Xi, we can write the equation as:

• P(Xi|Xi-1,........., X1) = P(Xi |Parents(Xi ))


Example
• Let’s assume that we’re creating a Bayesian Network that will model the marks (m) of a student
on his examination. The marks will depend on:

• Exam level (e): This is a discrete variable that can take two values, (difficult, easy)

• IQ of the student (i): A discrete variable that can take two values (high, low)

• The marks will intern predict whether or not he/she will get admitted (a) to a university.

• The IQ will also predict the aptitude score (s) of the student.

• With this information, we can build a Bayesian Network that will model the performance of a
student on an exam. The Bayesian Network can be represented as a DAG where each node
denotes a variable that predicts the performance of the student.
• Above I’ve represented this distribution through a DAG and a Conditional Probability Table. We can now calculate the Joint
Probability Distribution of these 5 variables, i.e. the product of conditional probabilities:

• Here,

• p(a | m) represents the conditional probability of a student getting an admission based on his marks.

• p(m | I, e) represents the conditional probability of the student’s marks, given his IQ level and exam level.

• p(i) denotes the probability of his IQ level (high or low)

• p(e) denotes the probability of the exam level (difficult or easy)

• p(s | i) denotes the conditional probability of his aptitude scores, given his IQ level

• we can formulate Bayesian Networks as:


• P(R)=0.0025

0.0025
Bayesian Networks Application
• Disease Diagnosis: Bayesian Networks are commonly used in the field of medicine for the detection and prevention of diseases. They can be used to

model the possible symptoms and predict whether or not a person is diseased.

• Optimized Web Search: Bayesian Networks are used to improve search accuracy by understanding the intent of a search and providing the most

relevant search results. They can effectively map users intent to the relevant content and deliver the search results.

• Spam Filtering: Bayesian models have been used in the Gmail spam filtering algorithm for years now. They can effectively classify documents by

understanding the contextual meaning of a mail. They are also used in other document classification applications.

• Gene Regulatory Networks: GRNs are a network of genes that are comprised of many DNA segments. They are effectively used to communicate with

other segments of a cell either directly or indirectly. Mathematical models such as Bayesian Networks are used to model such cell behavior in order to

form predictions.

• Biomonitoring: Bayesian Networks play an important role in monitoring the quantity of chemical dozes used in pharmaceutical drugs.
Support Vector Machine
“Support Vector Machine” (SVM) is a supervised machine learning algorithm which
can be used for both classification or regression challenges.
However, it is mostly used in classification problems.
• In this algorithm, we plot each data item as a point in n-dimensional space (where n is
number of features you have) with the value of each feature being the value of a
particular coordinate.
• Then, we perform classification by finding the hyperplane that differentiates the two
classes very well.
Support Vector Machine
Generally, Support Vector Machines is considered to be a classification approach, but it
can be employed in both types of classification and regression problems.
• It can easily handle multiple continuous and categorical variables.
• SVM constructs a hyperplane in multidimensional space to separate different
classes.
• SVM generates optimal hyperplane in an iterative manner, which is used to
minimize an error.
• The core idea of SVM is to find a maximum marginal hyperplane (MMH) that best
divides the dataset into classes.
Support Vector Machine
Support Vector Machine
Margin
• A margin is a gap between the two lines on the closest class points.
• This is calculated as the perpendicular distance from the line to support
vectors or closest points.
• If the margin is larger between the classes, then it is considered a good
margin; a smaller margin is a bad margin.
How SVM work for this?
Why SVM is so special ?
How classification will work?
How SVM work?
• The main objective is to segregate the given dataset in the best possible way.
• The distance between the nearest points is known as the margin.
• The objective is to select a hyperplane with the maximum possible margin between
support vectors in the given dataset.
• SVM searches for the maximum marginal hyperplane in the following steps:
• Generate hyperplanes which segregate the classes in the best way.
• Select the right hyperplane with the maximum segregation from the nearest
data points.
Non-linear and inseparable planes
• Some problems can’t be solved using linear hyperplane.
• In such situation, SVM uses a kernel trick to transform the input
space to a higher dimensional space as shown on the right.
• The data points are plotted on the x-axis and z-axis (where Z is the
squared sum of both x and y: z=x^2=y^2).
• Now you can easily segregate these points using linear separation.
Non-linear and inseparable planes
High Dimensional Space Mapping
High Dimensional Space Mapping
SVM Kernels
• The SVM algorithm is implemented in practice using a kernel.
• A kernel transforms an input data space into the required form.
• SVM uses a technique called the kernel trick. Here, the kernel takes a low-
dimensional input space and transforms it into a higher dimensional space.
• In other words, it converts non-separable problems to separable problems by
adding more dimensions to the data.
• It is most useful in non-linear separation problems.
• The kernel trick helps you to build a more accurate classifier.
Kernel Types

• Linear Kernel
• Polynomial Kernel
• Radial Basis Function Kernel
• Sigmoid Kernel
Kernel Types
Kernel Types
Types of SVM
1. Linear SVM
• Used when data is linearly separable
• Finds a linear hyperplane to separate classes
2. Non-Linear SVM
• Used when data is not linearly separable
• Uses kernel functions (like RBF, polynomial) to map data to higher dimensions
3. Support Vector Regression (SVR)
• SVM used for regression tasks instead of classification
4. Least Squares SVM (LS-SVM)
• Uses least squares cost function to simplify the optimization problem
Applications of SVM
• Image classification (e.g., face detection)
• Text classification (e.g., spam filtering)
• Bioinformatics (e.g., gene classification)
• Handwriting recognition
• Financial prediction (e.g., stock market trends)
• Medical diagnosis (e.g., disease detection)
Support Vector Machine
Step 1: Define the Support Vectors
Step 2: Compute Dot Products (Kernel Matrix)
Step 3: Dual Form Equation
Step 4: Solve the Linear System
Step 5: Compute the Weight Vector w
Step 6: Compute Bias b

Final Decision Function


Test the Classifier

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy