0% found this document useful (0 votes)

4 views

Deep Learn

The document provides an introduction to deep learning and machine learning, discussing essential mathematical concepts, types of machine learning, and various algorithms such as Support Vector Machines (SVM), Linear Regression, and Logistic Regression. It explains the process of data representation, model tuning, optimization algorithms, and the differences between hard and soft margin SVMs, as well as parametric and non-parametric models. Additionally, it covers K-nearest neighbors as a classification and regression method, highlighting its lazy learning approach.

Uploaded by

Pragyamita Basu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Deep Learn

Uploaded by

Pragyamita Basu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

Introduction to Deep Learning

The mathematical part of Machine Learning deals with statistics, calculus, linear algebra, and probabilities
Introduction to Deep Learning

Evolution of Artificial Intelligence

Example: Detecting email spam
Example: Detecting email spam
Types of Machine Learning
The components of Machine Learning
• ML is modelling technique that involves data.
• Final product of machine learning is the
compressed form of the data- extracts knowledge
from the data
Representation of the data

Each row represent one sample of the data (or

instance of data).
Example: Recognize apples and oranges
• Each row represent one apple or one orange
• The columns are different features such as color,
texture, shape, smell and so on of the objects.
• This matrix is usually called feature matrix which
represent our data in a way that can feed into the
computer.
• Target array, which stores the output of the values.
Either it will be the label of the different classes,
or the quantities.
Example: Image data: numerical values of pixels are
stored in an array.
Tunable Model
After we have the input and output data ready, we need an algorithm that can learn from the data. The algorithms are usually
have many different parameters that can be tuned in a way that the model can become better for our goal based on the data we
present to it. There are many different tunable models such as artificial neural network support vector machine, random forest,
logistic regression, and so on. Each model has different ideas behind it and be tuned differently.

Optimization algorithm
This is the main working force behind the machine learning algorithms. Usually we need to define some sort of objective
function and use the optimization algorithms to either minimize or maximize it. The objective function could be calculated
by the difference/error between the estimation and the target (true answer), therefore, usually the smaller the error is, the
better. There are different optimization algorithms to tune the model, such as gradient descent, least square minimization,
and so on.

Trained Model
After the tuning of the model, it has the capability of making a classification or prediction and so on. This trained model can
be used on the new data for achieving our goal.
Example:
Classification
Classification of group of Oranges and Apples (binary classification problem since only two classes)

• First calculate features for each apple and orange and save
• Since we have 5 features in the figure, it is not easy to
it into the feature matrix as shown below.
visualize it. If we only plot the first two features, i.e.
• Target label 0 represent orange and 1 refers to apple.
color and texture, we may see something as below:

• The classification problem essentially is a problem to

find a decision boundary, either a straight line or
other curves, to separate them.
• The tuning of the algorithm is basically to move this line or find out the shape of the boundary, but in a higher dimension
Support Vector Machines
• It is a very intuitive algorithm based on how we make the decision. • If we have a new data point (blue dot), then the black dotted line
• In the following figure, “which line boundary is better?” The black model will make the wrong decision. Therefore, the model which
dotted line or the red solid line? Most people will choose the red has a line close to the middle of the gap and far away from both
solid line, because it is in the middle of the gap between the two classes are better ones. This intuition from us need to be
groups. formalized into a way that the computer can do this.

The design of the SVM algorithm, it first forms a buffer from the
boundary line to the points in both classes that are close to the line
(these are the support vectors, where the name comes from). Then the
problem becomes given a set of these support vectors, which line has
the maximum buffer. The black dotted line has a narrow buffer while
the red solid line has a wider buffer.
Mathematics of Support Vector Machines
• Consider a random point X and we want to know
Dot product whether it lies on the right side of the plane or the
left side of the plane (positive or negative).
• we make a vector (w) which is perpendicular to
the hyperplane.
• Let’s say the distance of vector w from origin to
decision boundary is ‘c’. Now we take the
projection of X vector on w. Projection of any
vector or another vector is called dot-product.

In SVM we just need the projection

of A not the magnitude of B
point lies on the right side

point lies on the left side

Mathematics of Support Vector Machines

Equation of a hyper plane is w.x+b=0

• where w is a vector normal to hyper plane and b is an offset
To classify a point as negative or positive we need to define a decision rule as :

Now we need (w, b) such that the margin has a maximum distance. Let’s say this distance is ‘d’.
To calculate ‘d’ we need the equation of L1 and L2.
For this, we will take few assumptions that
the equation of L1 is w.x+b=1 and for L2 it is w.x+b= -1.
Mathematics of Support Vector Machines
• Take 2 support vectors, 1 from the negative class and 2nd from the positive class.
The distance between these two vectors x1 and x2 will be (x2-x1) vector.
• Take a vector ‘w’ perpendicular to the hyper plane and then find the projection of
(x2-x1) vector on ‘w’. To make this ‘w’ a unit vector we divide this with the norm
of ‘w’

Since x2 and x1 are support vectors and they lie on the hyper plane, hence they will follow yi* (w.x+b)=1
so we can write it as:

Hence the equation which we have to maximize is:

Support Vector Machines- Maximal margin classifier

Quadratic programming
problem with inequality
constraint

Solved using Lagrange multipliers

Compute the gradient:

Support Vector Machines- Maximal Margin Classifier
Non-zero alpha indicates a support vector:
When a data point has a non-zero alpha value, it is considered a support
vector, meaning it directly influences the position of the hyper plane in the
SVM model.
Importance in the decision boundary:
The alpha values of support vectors determine how much each data point
contributes to the final decision boundary.
Lagrange multiplier connection:
Alpha is essentially a Lagrange multiplier used in the SVM optimization
problem, which helps to find the optimal hyper plane by maximizing the
margin between classes while respecting the constraints.
Support Vector Machine
Solved example 𝑊 = 0.5(1,2) + 0.5(2,1) − (0,0) = [1.5,1.5]
𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏 = 0

𝐿1 = 𝑤. 𝑥 + 𝑏 = +1

For 𝑥 = (1, 2) ,
1.5 1 + 1.5 2 + 𝑏 = 1
 𝑏 = −3.5
For 𝑥 = (2, 1) ,
1.5 1 + 1.5 2 + 𝑏 = 1
 𝑏 = −3.5
𝐿2 = 𝑤. 𝑥 + 𝑏 = −1

For 𝑥 = (0,0) ,
1.5 0 + 1.5 0 + 𝑏 = −1
 𝑏 = −1

Take the average of b

−3.5−3.5−1
𝑏= 3
= −8/3
Support Vector Machine Solved example
Hard and Soft Margin SVM

• In real-life applications, we rarely encounter datasets that are perfectly linearly separable. Instead, we often come across
datasets that are either nearly linearly separable or entirely non-linearly separable.
Hard and Soft Margin SVM
“Hard Margin” is applicable only when we have “Linearly Separable Data”. In real datasets where data points perfectly
do not belong to the right region, Hard margins do not give good results. Hard Margins are very sensitive to outliers. We can
use Soft Margin classifications to avoid these issues. Hard margins will result in over fitting of a model that allows zero
errors.
Forcing rigid margins can result in a
model that performs perfectly in the
training set, but is possibly over-fit /
less generalizable when applied to a
new dataset.
Hard and Soft Margin SVM
The soft margin SVM follows a somewhat similar optimization procedure with a couple of differences. First, in this
scenario, we allow misclassifications to happen. So we’ll need to minimize the misclassification error, which means that
we’ll have to deal with one more constraint. Second, to minimize the error, we should define a loss function. A common loss
function used for soft margin is the hinge loss.

The loss of a misclassified point is called a slack variable and is added to the primal problem that we had for hard margin
SVM. So the primal problem for the soft margin becomes:

A new regularization parameter controls the trade-off between maximizing the margin and minimizing the loss. As you
can see, the difference between the primal problem and the one for the hard margin is the addition of slack variables.
The new slack variables ( in the figure below) add flexibility for misclassifications of the model:
SVM Kernels In many real-world scenarios, the data is not linearly separable
in the original feature space. Kernels help by implicitly
mapping the original feature space into a higher-dimensional
space where the data might be more easily separable.

Kernels allow SVMs to handle non-linearly separable data by

transforming the feature space.
SVM uses a method called the kernel trick to map the data to a higher-
dimensional feature space.
Linear Regression Model Representation
Simple linear regression when we have single input and output: Y=𝑏0 + 𝑏1 x

In estimating the coefficients of the model Least squares procedure seeks to minimize the sum of the squared residuals.
This means that given a regression line through the data we calculate the distance from each data point to the regression line,
square it, and sum all of the squared errors together. This is the quantity that ordinary least squares seeks to minimize.
Weight = 𝑏0 + 𝑏1 Height

The output for Linear regression should only be the continuous values such as price, age, salary, etc.
Linear Regression
LR allocates weight parameter,  for each of the training features. The predicted output(h(θ)) will be a
-
linear function of features (x) and coefficients (θ).

-Linear regression output equation

During the start of training, each theta is randomly initialized.

But during the training, we correct the theta corresponding to
each feature such that, the loss (metric of the deviation between
expected and predicted output) is minimized. Gradient descend
algorithm will be used to align the θ values in the right
direction. In the given diagram, each red dots represent the
training data and the blue line shows the derived solution.

Loss function :
In LR, we use mean squared error as the metric of loss.
The deviation of expected and actual outputs will be
squared and sum up. Derivative of this loss will be used by
gradient descend algorithm.
Logistic Regression
• Logistic Regression is used for solving the Classification problems.
• The output of Logistic Regression problem can be only between the 0 and 1.
• Logistic regression can be used where the probabilities between two classes is
required. Such as whether it will rain today or not, either 0 or 1, true or false etc.
• In logistic regression, we pass the weighted sum of inputs through an activation
function that can map values in between 0 and 1. Such activation function is
known as sigmoid function and the curve obtained is called as sigmoid curve or
S-curve. Consider the image:
The equation for logistic regression is
Logistic Regression - Probabilistic way of classification

The output of the logistic regression will be a probability (0 ≤ 𝑥 ≤ 1), and can be used to predict the binary 0 or 1
as the output ( if 𝑥 < 0.5, 𝑜𝑢𝑡𝑝𝑢𝑡 = 0, 𝑒𝑙𝑠𝑒 𝑜𝑢𝑡𝑝𝑢𝑡 = 1).

Logistic Regression acts somewhat very similar to linear regression. It also

calculates the linear output, followed by a squashing function over the
regression output. Sigmoid function is the frequently used logistic function.

The h(θ) value here corresponds to 𝑃(𝑦 = 1|𝑥), ie, probability of output to be binary 1, given input x. 𝑃(𝑦 = 0|𝑥) will be
equal to 1 − ℎ(𝜃). When value of z is 0, 𝑔(𝑧) will be 0.5. Whenever z is positive, ℎ(𝜃) will be greater than 0.5 and output
will be binary 1. Likewise, whenever z is negative, value of y will be 0. As we use a linear equation to find the classifier,
the output model also will be a linear one, that means it splits the input dimension into two spaces with all points in one
space corresponds to same label.
Logistic Regression
Sigmoid function

Loss function :
We can’t use mean squared error as loss function (like linear regression), because
we use a non-linear sigmoid function at the end. MSE function may introduce
local minimums and will affect the gradient descend algorithm.
So cross entropy is used as loss function here. Two equations will be used,
corresponding to y=1 and y=0. The basic logic here is that, whenever the
prediction is badly wrong, (eg : 𝑦’ = 1 & 𝑦 = 0), cost will be −log(0) which is
infinity.
Parametric vs Non-parametric models
Statistical learning models can be classified as parametric or nonparametric models. Machine learning models use
statistical learning to predict unseen data.
The purpose of the statistical model is to approximate the relationship between the dependent and independent variables.
The dependent variable is the center of attention of machine learning models which is needed to predict based on
one/some independent variable.
Parametric model
A common example of a parametric algorithm is linear regression, and the linear
function uses parameters like intercept and slopes, so by training the data we calculate
these parameters.
• Some other examples of parametric machine learning algorithms are: Linear
Support Vector Machines, Logistic Regression, Naive Bayes classifier, Perceptron

Non-parametric models do not assume the model based on a function. This model is
flexible, the example of a non-parametric model is the K-Nearest Neighbor that makes
predictions based on the k most similar training patterns for a new data instance. The
method does not assume anything about the form of the mapping function other than
patterns that are close are likely have a similar output variable.
• Some other examples of non-parametric machine learning algorithms are: Decision
Tree, SVM with Gaussian kernel, ANN
K – nearest neighbors
-
KNN
K-nearest neighbors is a non-parametric method used for classification and regression. It is one of the most easy ML
technique used. It is a lazy learning model, with local approximation.
Basic Theory
The basic logic behind KNN is to explore your neighborhood, assume the test datapoint to be similar to them and derive
the output. In KNN, we look for k neighbors and come up with the prediction.
In case of KNN classification, a majority voting is applied over the k nearest data points whereas, in KNN regression,
mean of k nearest data points is calculated as the output. As a rule of thumb, we select odd numbers as k. KNN is a lazy
learning model where the computations happens only runtime.
K – nearest neighbors
-
In the given figure yellow and violet points corresponds to Class A and
Class B in training data. The red star, points to the test data which is to be
classified. when k = 3, we predict Class B as the output and when K=6, we
predict Class A as the output.

Loss function :
There is no training involved in KNN. During testing, k neighbors with minimum distance, will take part in
classification /regression.
K – nearest neighbors
Example – Consider the following table of height, weight and age values of 10 people. Predict the weight value of the
person ID-11

D (11, 1) = sqrt((5.5-5)^2+(38-45)^2)= 7.017

D(11,1) = 0.5+7=7.5
Predicted value of weight for sample 11 is 69.6
K – nearest neighbors
Solution– Consider the following table of height, weight and age values of 10 people. Predict the weight value of the
person ID-11

Since ID11 is closer to points 5 and 1, so it

must have a weight similar to these IDs,
probably between 72-77 kgs (weights of
ID1 and ID5 from the table).

How KNN algorithm predicts this value?

K – nearest neighbors
Solution– KNN algorithm uses feature similarity for prediction.

First the distance between the new point and each

training input is calculated
K – nearest neighbors Solution– KNN algorithm uses feature similarity for prediction.

• First the distance between the new point and each

training input is calculated.
• The closest k data points are selected (based on
the distance). In this example, points 1, 5, 6 will
be selected if value of k is 3.
Methods of calculating the distance

Hamming Distance:
K – nearest neighbors Continued- How to choose the k factor? (k = no. of neighbors)

The KNN Algorithm

1.Load the data
2.Initialize K to your chosen number of neighbors
3. For each example in the data
3.1 Calculate the distance between the query example and the current example from the data.
3.2 Add the distance and the index of the example to an ordered collection
4. Sort the ordered collection of distances and indices from smallest to largest (in ascending order) by the
distances
5. Pick the first K entries from the sorted collection
6. Get the labels of the selected K entries
7. If regression, return the mean of the K labels
8. If classification, return the mode of the K labels
Naïve Bayes Classifier
Naive Bayes is a kind of classifier which uses the Bayes Theorem.

Bayes Theorem
Conditional probability - probability that something will happen, given that something else has already
occurred.
Using the conditional probability, we can calculate the probability of an event using its prior knowledge.
Below is the formula for calculating the conditional probability.

P(H) - probability of hypothesis H being true. This is known as the prior probability.
P(E) - probability of the evidence (regardless of the hypothesis).
P(E|H) - probability of the evidence given that hypothesis is true. (probability of Likelihood of evidence)
P(H|E) - probability of the hypothesis given that the evidence is there.
Naïve Bayes Classifier
Example- consider a school with a total population of 100.
what is the conditional probability that a certain member of the
school is a ‘Teacher’ given that he is a ‘Man’?

solution
Naïve Bayes Classifier
Example- Consider 1000 fruits which could be either ‘banana’, ‘orange’ or ‘other’. These are the 3 possible classes of the Y
variable. We have data for the following X variables (binary) - long, sweet and yellow. These are the features used in training
dataset and first few rows of training data set are given below

Fruit Long (x1) Sweet (x2) Yellow (x3)

Orange 0 1 0
Banana 1 0 1
Banana 1 1 1
Other 1 1 0

Aggregated dataset (counts table)

Predict if the given fruit is banana, orange, or other

Naïve Bayes Classifier
Solution

Step 1: Compute the ‘Prior’ probabilities for each of the class of fruits.
P(Y=Banana) = 500 / 1000 = 0.50
P(Y=Orange) = 300 / 1000 = 0.30
P(Y=Other) = 200 / 1000 = 0.20

Step 2: Compute the probability of evidence that goes in the denominator.

P(x1=Long) = 500 / 1000 = 0.50

P(x2=Sweet) = 650 / 1000 = 0.65
P(x3=Yellow) = 800 / 1000 = 0.80
Step 3: Compute the probability of likelihood of evidences that goes in the numerator.
Probability of Likelihood for Banana
P(x1=Long | Y=Banana) = 400 / 500 = 0.80
P(x2=Sweet | Y=Banana) = 350 / 500 = 0.70
P(x3=Yellow | Y=Banana) = 450 / 500 = 0.90

If a fruit is long, sweet and yellow, what fruit it is?

Naïve Bayes Classifier
If a fruit is long, sweet and yellow, what fruit it is?

P(long | banana) P( sweet | banana) P( yellow | banana) P(banana)

P( Banana | (long , sweet , yellow) 
P(long , sweet , yellow)

= (0.8 ) (0.7) (0.9) (0.5)/ P(Evidence)

P(long | Orange) P( sweet | Orange) P( yellow | Orange) P(Orange)

P(Orange | (long , sweet , yellow) 
P(long , sweet , yellow)

= 0 / P(Evidence)

P(long | Other ) P( sweet | Other ) P( yellow | Other ) P(Other )

P(Other | (long , sweet , yellow) 
P(long , sweet , yellow)
= 0.01875 / P(Evidence)

Answer- Banana, since it has highest probability amongst 3 classes

Naïve Bayes Classifier Bayes Theorem
Example: Patient’s probability of finding out a disease.

• D – disease with two test results “positive” & “negative”

• Lab test guarantees 99% accuracy of test result - i.e patient with disease will get test positive 99%
of the time and if the patient doesn’t have the disease, will get test negative 99% of the time.
• If 3% of the people have this disease and test gives positive result, what is the probability
that a person actually has this disease?
Solution: p(D) = 3% = 0.03 - probability of people suffering from disease
p(~D) = 97% = 0.97 - probability of people not suffering from disease
p(pos | D) = 99% = 0.99 – probability of getting positive test for a patient having the disease
p(pos | ~D) = 1% = 0.01 – probability of getting positive test for a patient not suffering from disease
probability that the patient actually has the disease
p( pos | D) p( D)
P( D | pos)  where P( pos)  P( D) P( pos | D)  P(~ D) P( pos |~ D)
p( pos)

P(D|pos) = 0.75 i.e 75% chances are there that the patient is actually suffering from the disease.
Naïve Bayes Classifier
It predicts membership probabilities for each class such as the probability that given record or data point belongs
to a particular class. The class with the highest probability is considered as the most likely class. This is also
known as Maximum A Posteriori (MAP). MAP  max(P( H | E )
max(P( E | H ) P( H ))

P( E )
 max(P( E | H ) P( H )) P(E) is used to normalize the result

Example: 3 classes associated with animal types: i) parrot ii) Dog iii) Fish.
4 features for predicting the animal type: i) Swim ii) Wings iii) Green colour iv) Dangerous teeth

Swim Wings Green colour Dangerous Animal type

teeth
50/500 500/500 400/500 0 Parrot
450/500 0 0 500/500 Dog
500/500 0 150/500 50/500 Fish
Total number of animals: 1500
Naïve Bayes Classifier
Prediction using Naïve Bayes classifier

Swim Wings Green colour Dangerous Animal type

teeth
True False True False ?
True False True True ?

Naive Bayes approach: P( E1 | H ) P( E 2 | H )...P( En | H ) P( H )

P( H | Multiple  evidences) 
P( Multiple  evidences)
i) For the first record of data the evidence is Swim and Green
Hypothesis testing for the animal to be Dog
P( Swim | Dog ) P(Green | Dog ) P( Dog )
P( Dog | ( Swim, Green) 
P( Swim, Green)

0.9  0  0.333
 0
P( Swim, Green)
Naïve Bayes Classifier
Prediction using Naïve Bayes classifier

Swim Wings Green colour Dangerous Animal type

teeth
True False True False ?
True False True True ?

ii) For the second record of data the evidence is Swim, Green and dangerous teeth
Hypothesis testing for the animal to be Dog
P( Swim | Dog ) P(Green | Dog ) P(Teeth | Dog ) P( Dog )
P( Dog | ( Swim, Green, Teeth) 
P( Swim, Green, Teeth)

0.9  0 1 0.333

 0
P( Swim, Green)
Naïve Bayes Classifier
Prediction using Naïve Bayes classifier

Hypothesis testing for animal Hypothesis testing for animal to Hypothesis testing for animal Animal Type
to be Dog be Parrot to be Fish

0.9  0  0.333 0.1 0.8  0.333 1  0.2  0.333

0  0.02664  0.0666
Fish
P( Swim, Green) P( Swim, Green) P( Swim, Green)

0.9  0 1 0.333

0 0.1  0.8  0  0.333 1  0.2  0.1  0.333
 0.00666
Fish
P( Swim, Green, Teeth) 0
P( Swim, Green, Teeth) P( Swim, Green, Teeth)

Using Naive Bayes, we can predict that the class of first record is Fish and the second record is also Fish.

Naive Bayes can learn individual features importance but can’t determine the relationship among features.

Eurofins Product Testing PDF
No ratings yet
Eurofins Product Testing PDF
1 page
SVM Using Python
No ratings yet
SVM Using Python
24 pages
Support Vector Machine
100% (2)
Support Vector Machine
11 pages
EXP-14
No ratings yet
EXP-14
27 pages
SVM notes unit 4.docx
No ratings yet
SVM notes unit 4.docx
8 pages
Unit 2
No ratings yet
Unit 2
47 pages
Chapter 07 SVM
No ratings yet
Chapter 07 SVM
20 pages
Fundamental Knowledge of Machine Learning: Abstract This Chapter Introduces The Basic Concepts and Methods of Machine
No ratings yet
Fundamental Knowledge of Machine Learning: Abstract This Chapter Introduces The Basic Concepts and Methods of Machine
14 pages
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
No ratings yet
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
31 pages
SVM - Feb 15
No ratings yet
SVM - Feb 15
34 pages
ML Unit 3
No ratings yet
ML Unit 3
14 pages
16 SVM
No ratings yet
16 SVM
41 pages
IVPML Unit III
No ratings yet
IVPML Unit III
139 pages
Support Vector Machine
No ratings yet
Support Vector Machine
52 pages
Support Vector Machine - Explanation
No ratings yet
Support Vector Machine - Explanation
12 pages
SVM
No ratings yet
SVM
11 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
4 pages
2024-SCU-ML-2-1-SVM
No ratings yet
2024-SCU-ML-2-1-SVM
36 pages
SVM
No ratings yet
SVM
6 pages
UNIT - 2
No ratings yet
UNIT - 2
15 pages
Support Vector Machines: (Vapnik, 1979)
No ratings yet
Support Vector Machines: (Vapnik, 1979)
34 pages
Svm
No ratings yet
Svm
29 pages
W12 SVM
No ratings yet
W12 SVM
52 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
6. Support Vector Machine for Classification
No ratings yet
6. Support Vector Machine for Classification
38 pages
Unit II 2.2 ML Kernel Machines SVM
No ratings yet
Unit II 2.2 ML Kernel Machines SVM
50 pages
Lecture 18 - SVM
No ratings yet
Lecture 18 - SVM
54 pages
Support Vector Machine
No ratings yet
Support Vector Machine
21 pages
SVM&Decision Tree
No ratings yet
SVM&Decision Tree
10 pages
SVM.pptx
No ratings yet
SVM.pptx
67 pages
Ann Unit III
No ratings yet
Ann Unit III
20 pages
Unit3-SVM
No ratings yet
Unit3-SVM
20 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
UNIT-III Support Vector Machines
No ratings yet
UNIT-III Support Vector Machines
43 pages
SVM Consolidated
No ratings yet
SVM Consolidated
34 pages
support_vector_machines
No ratings yet
support_vector_machines
12 pages
Tutorial On Support Vector Machine (SVM) : Abstract
No ratings yet
Tutorial On Support Vector Machine (SVM) : Abstract
13 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
Lec06 SVM
No ratings yet
Lec06 SVM
25 pages
Support Vector Machines
No ratings yet
Support Vector Machines
16 pages
Master Thesis Support Vector Machine
100% (3)
Master Thesis Support Vector Machine
5 pages
2.1 SVM
No ratings yet
2.1 SVM
16 pages
Machine Learning - Open Elective - Part III
No ratings yet
Machine Learning - Open Elective - Part III
90 pages
SVM-1
No ratings yet
SVM-1
36 pages
Unit-1 DL
No ratings yet
Unit-1 DL
29 pages
SVM Scribe Notes
No ratings yet
SVM Scribe Notes
16 pages
Support Vector Machine
No ratings yet
Support Vector Machine
31 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
A09-Support-Vector-Machines-2up (3)
No ratings yet
A09-Support-Vector-Machines-2up (3)
15 pages
UNIT - 2-1
No ratings yet
UNIT - 2-1
7 pages
Seminar
No ratings yet
Seminar
51 pages
Unit 2 - SVM - 241016 - 104220
No ratings yet
Unit 2 - SVM - 241016 - 104220
47 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
ML Chapter 5 Part 2
No ratings yet
ML Chapter 5 Part 2
52 pages
Unit-4 AI - SVM
No ratings yet
Unit-4 AI - SVM
21 pages
3.unit 3 ML Part-2 Q&A
No ratings yet
3.unit 3 ML Part-2 Q&A
23 pages
Support Vector Machine
No ratings yet
Support Vector Machine
32 pages
Presented By: M. Saqib Iqbal Gull Muhammad Presented To: Mr. Imran Ali Khan Artificial Intelligence National College of Bussiness Administration & Economics Multan
No ratings yet
Presented By: M. Saqib Iqbal Gull Muhammad Presented To: Mr. Imran Ali Khan Artificial Intelligence National College of Bussiness Administration & Economics Multan
11 pages
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
From Everand
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
Fouad Sabry
No ratings yet
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet
Applied Sciences: Extending Battery Lifetime by Avoiding High SOC
No ratings yet
Applied Sciences: Extending Battery Lifetime by Avoiding High SOC
16 pages
Block-3 Marketing
No ratings yet
Block-3 Marketing
60 pages
BT 01
No ratings yet
BT 01
2 pages
Tagalog Reviewers
No ratings yet
Tagalog Reviewers
132 pages
Implementation of Space Vector Pulse Width Modulation (SVPWM) For Three Phase Voltage Source Inverter Using Matlab Simulink - 24 Pages
100% (1)
Implementation of Space Vector Pulse Width Modulation (SVPWM) For Three Phase Voltage Source Inverter Using Matlab Simulink - 24 Pages
24 pages
Download Sculpture and Archaeology 1st Edition Paul Bonaventura ebook file with all chapters
100% (5)
Download Sculpture and Archaeology 1st Edition Paul Bonaventura ebook file with all chapters
78 pages
Sultan Kudarat Educational Institution Inc
No ratings yet
Sultan Kudarat Educational Institution Inc
7 pages
5 Storey Mixed Office Residential Using Midas Gen STATIC
No ratings yet
5 Storey Mixed Office Residential Using Midas Gen STATIC
132 pages
PST Interim Review and Professional Learning Plan
No ratings yet
PST Interim Review and Professional Learning Plan
2 pages
Project of Green Building
No ratings yet
Project of Green Building
53 pages
Vega R Crankshaft & Piston
No ratings yet
Vega R Crankshaft & Piston
1 page
AP Psychology 12 Memory
No ratings yet
AP Psychology 12 Memory
4 pages
Grayscale Image Coloring by Using Ycbcr and HSV Color Spaces
No ratings yet
Grayscale Image Coloring by Using Ycbcr and HSV Color Spaces
7 pages
Normal Modes More
No ratings yet
Normal Modes More
2 pages
Practice Problems
No ratings yet
Practice Problems
5 pages
Draft Resolution 1.1 SPECPOL LUMUN 2013
100% (1)
Draft Resolution 1.1 SPECPOL LUMUN 2013
8 pages
Jazz, Ragtime, & Bigband
No ratings yet
Jazz, Ragtime, & Bigband
15 pages
Janna Kasmanian Resume 2
No ratings yet
Janna Kasmanian Resume 2
3 pages
Rapid Decent - First Run Down The Shuiluo River - CHINA
100% (1)
Rapid Decent - First Run Down The Shuiluo River - CHINA
5 pages
Algorithms, Part I: Exercises: Analysis of Algorithms
No ratings yet
Algorithms, Part I: Exercises: Analysis of Algorithms
2 pages
Lastexception 63846908709
No ratings yet
Lastexception 63846908709
1 page
Final Reflective Essay Jenayamcgee
No ratings yet
Final Reflective Essay Jenayamcgee
3 pages
Term Paper: University of Lagos
No ratings yet
Term Paper: University of Lagos
8 pages
English 2023
No ratings yet
English 2023
7 pages
Pepsico
No ratings yet
Pepsico
8 pages
Cirugia
No ratings yet
Cirugia
54 pages
Financial Reporting - Module 3 Quiz
No ratings yet
Financial Reporting - Module 3 Quiz
9 pages
TEVO Black Widow - Read This First - New Bie Manual V6
0% (1)
TEVO Black Widow - Read This First - New Bie Manual V6
8 pages
Soal Mid Kelas 9
No ratings yet
Soal Mid Kelas 9
4 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Deep Learn

Uploaded by

Deep Learn

Uploaded by

Introduction to Deep Learning

Evolution of Artificial Intelligence

Each row represent one sample of the data (or

• The classification problem essentially is a problem to

In SVM we just need the projection

point lies on the left side

Equation of a hyper plane is w.x+b=0

Hence the equation which we have to maximize is:

Solved using Lagrange multipliers

Compute the gradient:

Take the average of b

Kernels allow SVMs to handle non-linearly separable data by

-Linear regression output equation

During the start of training, each theta is randomly initialized.

Logistic Regression acts somewhat very similar to linear regression. It also

D (11, 1) = sqrt((5.5-5)^2+(38-45)^2)= 7.017

Since ID11 is closer to points 5 and 1, so it

How KNN algorithm predicts this value?

First the distance between the new point and each

• First the distance between the new point and each

The KNN Algorithm

Fruit Long (x1) Sweet (x2) Yellow (x3)

Aggregated dataset (counts table)

Predict if the given fruit is banana, orange, or other

Step 2: Compute the probability of evidence that goes in the denominator.

P(x1=Long) = 500 / 1000 = 0.50

If a fruit is long, sweet and yellow, what fruit it is?

P(long | banana) P( sweet | banana) P( yellow | banana) P(banana)

= (0.8 ) (0.7) (0.9) (0.5)/ P(Evidence)

P(long | Orange) P( sweet | Orange) P( yellow | Orange) P(Orange)

P(long | Other ) P( sweet | Other ) P( yellow | Other ) P(Other )

Answer- Banana, since it has highest probability amongst 3 classes

• D – disease with two test results “positive” & “negative”

Swim Wings Green colour Dangerous Animal type

Swim Wings Green colour Dangerous Animal type

Naive Bayes approach: P( E1 | H ) P( E 2 | H )...P( En | H ) P( H )

Swim Wings Green colour Dangerous Animal type

0.9  0 1 0.333

0.9  0  0.333 0.1 0.8  0.333 1  0.2  0.333

0.9  0 1 0.333

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.