22cse61 Module 4
22cse61 Module 4
Example:
Suppose we want to predict hours studied (x) and exam score (y):
Hours Studied (x) Exam Score (y)
1 2
1. Find the slope (m)
2 4
2. Find the intercept (c)
3 5
4 4 3. Form the equation: y=mx+c
5 5
Multiple Linear Regression
The goal of Multiple Regression is to predict a dependent variable (e.g., score) using
multiple independent variables (e.g., Hours Studied and Assignments Done).
Used For: Predicting a continuous outcome using multiple variables.
Multiple Linear Regression
Real-Time Example : Multiple Regression (Predicting Salary
Based on Experience and Education Level)
Scenario:
Suppose you are an HR analyst at a company, and you're tasked with predicting the annual salary of employees based
on years of experience and their education level.
You collect data on employees as follows:
1 1 (Bachelor's) 50,000
2 1 (Bachelor's) 55,000
3 2 (Master's) 65,000
4 2 (Master's) 70,000
5 3 (PhD) 90,000
Real-Time Example : Multiple Regression (Predicting Salary
Based on Experience and Education Level)
Real-Time Example : Multiple Regression (Predicting Salary
Based on Experience and Education Level)
Logistic Regression
In Logistic Regression, we predict the probability of a binary outcome, such as whether a
student will pass (1) or fail (0) based on some predictor variables (e.g., hours studied).
Used For: Predicting binary outcomes (YES/NO).
Logistic Regression
Logistic Regression
Summary of the Process
Real-Time Example 2: Logistic Regression (Predicting
Whether a Customer Will Buy a Product Based on Age)
Scenario:
Suppose you're working for an e-commerce company and you're tasked with predicting whether a customer will purchase a
product based on their age.
You have data on customers as follows:
• A Directed Acyclic Graph is used to represent a Bayesian Network and like any other
statistical graph, a DAG contains a set of nodes and links, where the links denote the
relationship between the nodes.
• Arc or directed arrows represent the causal relationship or conditional probabilities between random variables.
These directed links or arrows connect the pair of nodes in the graph.
These links represent that one node directly influence the other node, and if there is no directed link that means
that nodes are independent with each other
• In the above diagram, A, B, C, and D are random variables represented by the nodes of the network graph. A,b,c,d are
independent event.
• If we are considering node B, which is connected with node A by a directed arrow, then node A is called the parent of
Node B.
• Node C is independent of node A.
• Note: The Bayesian network graph does not contain any cyclic graph. Hence, it is known as a directed acyclic
graph or DAG.
What Is Conditional Probability?
• If we have variables x1, x2, x3,....., xn, then the probabilities of a different combination of x1, x2, x3.. xn, are known as Joint
probability distribution.
• P[x1, x2, x3,....., xn], it can be written as the following way in terms of the joint probability distribution.
• = P[x1| x2, x3,....., xn]P[x2, x3,....., xn]= P[x1| x2, x3,....., xn]P[x2|x3,....., xn]....P[xn-1|xn]P[xn].
• In general for each variable Xi, we can write the equation as:
• Exam level (e): This is a discrete variable that can take two values, (difficult, easy)
• IQ of the student (i): A discrete variable that can take two values (high, low)
• The marks will intern predict whether or not he/she will get admitted (a) to a university.
• The IQ will also predict the aptitude score (s) of the student.
• With this information, we can build a Bayesian Network that will model the performance of a
student on an exam. The Bayesian Network can be represented as a DAG where each node
denotes a variable that predicts the performance of the student.
• Above I’ve represented this distribution through a DAG and a Conditional Probability Table. We can now calculate the Joint
Probability Distribution of these 5 variables, i.e. the product of conditional probabilities:
• Here,
• p(a | m) represents the conditional probability of a student getting an admission based on his marks.
• p(m | I, e) represents the conditional probability of the student’s marks, given his IQ level and exam level.
• p(s | i) denotes the conditional probability of his aptitude scores, given his IQ level
0.0025
Bayesian Networks Application
• Disease Diagnosis: Bayesian Networks are commonly used in the field of medicine for the detection and prevention of diseases. They can be used to
model the possible symptoms and predict whether or not a person is diseased.
• Optimized Web Search: Bayesian Networks are used to improve search accuracy by understanding the intent of a search and providing the most
relevant search results. They can effectively map users intent to the relevant content and deliver the search results.
• Spam Filtering: Bayesian models have been used in the Gmail spam filtering algorithm for years now. They can effectively classify documents by
understanding the contextual meaning of a mail. They are also used in other document classification applications.
• Gene Regulatory Networks: GRNs are a network of genes that are comprised of many DNA segments. They are effectively used to communicate with
other segments of a cell either directly or indirectly. Mathematical models such as Bayesian Networks are used to model such cell behavior in order to
form predictions.
• Biomonitoring: Bayesian Networks play an important role in monitoring the quantity of chemical dozes used in pharmaceutical drugs.
Support Vector Machine
“Support Vector Machine” (SVM) is a supervised machine learning algorithm which
can be used for both classification or regression challenges.
However, it is mostly used in classification problems.
• In this algorithm, we plot each data item as a point in n-dimensional space (where n is
number of features you have) with the value of each feature being the value of a
particular coordinate.
• Then, we perform classification by finding the hyperplane that differentiates the two
classes very well.
Support Vector Machine
Generally, Support Vector Machines is considered to be a classification approach, but it
can be employed in both types of classification and regression problems.
• It can easily handle multiple continuous and categorical variables.
• SVM constructs a hyperplane in multidimensional space to separate different
classes.
• SVM generates optimal hyperplane in an iterative manner, which is used to
minimize an error.
• The core idea of SVM is to find a maximum marginal hyperplane (MMH) that best
divides the dataset into classes.
Support Vector Machine
Support Vector Machine
Margin
• A margin is a gap between the two lines on the closest class points.
• This is calculated as the perpendicular distance from the line to support
vectors or closest points.
• If the margin is larger between the classes, then it is considered a good
margin; a smaller margin is a bad margin.
How SVM work for this?
Why SVM is so special ?
How classification will work?
How SVM work?
• The main objective is to segregate the given dataset in the best possible way.
• The distance between the nearest points is known as the margin.
• The objective is to select a hyperplane with the maximum possible margin between
support vectors in the given dataset.
• SVM searches for the maximum marginal hyperplane in the following steps:
• Generate hyperplanes which segregate the classes in the best way.
• Select the right hyperplane with the maximum segregation from the nearest
data points.
Non-linear and inseparable planes
• Some problems can’t be solved using linear hyperplane.
• In such situation, SVM uses a kernel trick to transform the input
space to a higher dimensional space as shown on the right.
• The data points are plotted on the x-axis and z-axis (where Z is the
squared sum of both x and y: z=x^2=y^2).
• Now you can easily segregate these points using linear separation.
Non-linear and inseparable planes
High Dimensional Space Mapping
High Dimensional Space Mapping
SVM Kernels
• The SVM algorithm is implemented in practice using a kernel.
• A kernel transforms an input data space into the required form.
• SVM uses a technique called the kernel trick. Here, the kernel takes a low-
dimensional input space and transforms it into a higher dimensional space.
• In other words, it converts non-separable problems to separable problems by
adding more dimensions to the data.
• It is most useful in non-linear separation problems.
• The kernel trick helps you to build a more accurate classifier.
Kernel Types
• Linear Kernel
• Polynomial Kernel
• Radial Basis Function Kernel
• Sigmoid Kernel
Kernel Types
Kernel Types
Types of SVM
1. Linear SVM
• Used when data is linearly separable
• Finds a linear hyperplane to separate classes
2. Non-Linear SVM
• Used when data is not linearly separable
• Uses kernel functions (like RBF, polynomial) to map data to higher dimensions
3. Support Vector Regression (SVR)
• SVM used for regression tasks instead of classification
4. Least Squares SVM (LS-SVM)
• Uses least squares cost function to simplify the optimization problem
Applications of SVM
• Image classification (e.g., face detection)
• Text classification (e.g., spam filtering)
• Bioinformatics (e.g., gene classification)
• Handwriting recognition
• Financial prediction (e.g., stock market trends)
• Medical diagnosis (e.g., disease detection)
Support Vector Machine
Step 1: Define the Support Vectors
Step 2: Compute Dot Products (Kernel Matrix)
Step 3: Dual Form Equation
Step 4: Solve the Linear System
Step 5: Compute the Weight Vector w
Step 6: Compute Bias b