0% found this document useful (0 votes)
17 views

ML Unit 3 V1

Uploaded by

sampathmandru18
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

ML Unit 3 V1

Uploaded by

sampathmandru18
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

MACHINE LEARNING

Ensemble Learning and Random Forests


BTech III Year – II Semester
Computer Science & Engineering

UNIT-III
By

Dr.Satyabrata Dash
Professor
Department of Computer Science & Engineering
Ramachandra College of Engineering, Eluru

1
SYLLABUS
UNIT-III
MACHINE LEARNING
Ensemble Learning and Random Forests:
• Introduction,
• Voting Classifiers,
• Bagging and Pasting,
• Random Forests,
• Boosting,
• Stacking.
Support Vector Machine:
• Linear SVM Classification,
• Nonlinear SVM Classification
• SVM Regression,
• Naïve Bayes Classifiers.
2
Support Vector Machines(Linear SVM Classification)

1. Support Vector Machine(SVM) is a supervised machine learning algorithm used for both
classification and regression.
2. The objective of the support vector machine algorithm is to find a hyperplane in an N-
dimensional space(N the number of features) that distinctly classifies the data points.
3. The dimension of the hyperplane depends upon the number of features.
4. SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme
cases are called as support vectors, and hence algorithm is termed as Support Vector Machine.
5. If the number of input features is two, then the hyperplane is just a line. If the number of input
features is three, then the hyperplane becomes a 2-D plane. It becomes difficult to imagine when
the number of features exceeds three.
Support Vector Machines

1. Hyperplane: There can be multiple lines/decision boundaries to segregate the classes in n-


dimensional space, but we need to find out the best decision boundary that helps to classify the
data points. This best boundary is known as the hyperplane of SVM.
2. Support Vectors:
The data points or vectors that are the closest to the hyperplane and which affect the position of
the hyperplane are termed as Support Vector. Since these vectors support the hyperplane, hence
called a Support vector.
Support Vector Machines Linear Separators
• Binary classification can be viewed as the task of separating classes in feature space:

f(x) = sign(wTx + b)

wTx + b = 0
wTx + b > 0
wTx + b < 0
Support Vector Machines Linear Separators

• Which of the linear separators is optimal?

Linear Separators
Support Vector Machines Classification Margin

• Distance from example xi to the separator is


w T xi  b
r
w
• Examples closest to the hyperplane are support vectors.
• Margin ρ of the separator is the distance between support vectors.

r
Support Vector Machines Maximum Margin Classification

• Maximizing the margin is good according to intuition and PAC theory.


• Implies that only support vectors matter; other training examples are ignorable.
Support Vector Machines Linear SVM Mathematically

• Let training set {(xi, yi)}i=1..n, xiRd, yi  {-1, 1} be separated by a hyperplane with
margin ρ. Then for each training example (xi, yi):

wTxi + b ≤ - ρ/2 if yi = -1
wTxi + b ≥ ρ/2 if yi = 1  yi(wTxi + b) ≥ ρ/2
• For every support vector xs the above inequality is an equality. After rescaling w and b by ρ/2 in
the equality, we obtain that distance between each xs and the hyperplane is

y s ( w T x s  b) 1
r 
• Then the margin can be expressed through (rescaled) w and b as: w w

2
  2r 
w
Support Vector Machines Advantages

Advantages of SVM:

1. Effective in high dimensional cases


2. Its memory efficient as it uses a subset of training points in the decision function
called support vectors
3. Different kernel functions can be specified for the decision functions and its
possible to specify custom kernels
Support Vector Machines(Nonlinear SVM Classification )

Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which
means if a dataset cannot be classified by using a straight line, then such data is termed
as non-linear data and classifier used is called as Non-linear SVM classifier
Support Vector Machines(Nonlinear SVM Classification )

1. This can be done by projecting the dataset into a higher dimension in which it is linearly
separable.
2. By using a linear classifier we can separate a non-linearly separable dataset
3. When the data is not linearly separable, we use the non-linear SVM classifier to separate
the data points.
4. It hypothetically takes the data points to a higher dimension, so that they are linearly separable in
that dimension and then the algorithm classifies them.
5. What is a Kernel Function?
6. In machine learning, a kernel refers to a method that allows us to apply linear classifiers to
non-linear problems by mapping non-linear data into a higher-dimensional space without
the need to visit or understand that higher-dimensional space.
7. This function transforms the n-dimensional input space to an m-dimensional space where
n>>m, so that we can do the required calculations in a higher dimension efficiently.
Support Vector Machines(Nonlinear SVM Classification )
Support Vector Machines(Nonlinear SVM Classification
types of Kernel Functions)
Support Vector Regression(SVR)
1. Support Vector Regression is a supervised learning algorithm that is used to predict discrete
values.
2. Support Vector Regression (SVR) uses the same principle as SVM, but for regression
problems.
3. The basic idea behind SVR is to find the best fit line.
4. In SVR, the best fit line is the Hyperplane that has the maximum number of points.
Support Vector Regression(SVR)
1. Unlike other Regression models that try to minimize the error between the real and predicted
value, the SVR tries to fit the best line within a threshold value. The threshold value is the
distance between the hyperplane and boundary line.

2. In SVR, the best fit line is the Hyperplane that has the maximum number of points.
Support Vector Regression(SVR)
In the case of regression, a margin of tolerance (epsilon) is set in approximation to the SVM
the main idea is always the same: to minimize error, individualizing the hyperplane which
maximizes the margin, keeping in mind that part of the error is tolerated.
Support Vector Regression(SVR)

Linear SVR

Non-linear SVR
The kernel functions transform the data into a higher dimensional feature space to make it
possible to perform the linear separation.
Machine Learning Basic Methods: Naive Bayes Methods

1. Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes


theorem and used for solving classification problems.
2. It is mainly used in text classification that includes a high-dimensional training dataset.
3. It is a probabilistic classifier, which means it predicts on the basis of the probability
of an object.
4. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature
in a class is unrelated to the presence of any other feature.
5. Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental
analysis, and classifying articles.
Machine Learning Basic Methods: Naive Bayes Methods
Example:

A fruit may be considered to be an apple if it is red, round, and about 3 inches in diameter. Even if
these features depend on each other or upon the existence of the other features, all of these properties
independently contribute to the probability that this fruit is an apple and that is why it is known as
‘Naive’.

The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which can be described as:

1. Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is independent
of the occurrence of other features. Such as if the fruit is identified on the bases of color, shape, and
taste, then red, spherical, and sweet fruit is recognized as an apple. Hence each feature individually
contributes to identify that it is an apple without depending on each other.
2. Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem.
Machine Learning Basic Methods: Naive Bayes Methods
1. Naive Bayes model is easy to build and particularly useful for very large data sets.
2. Along with simplicity, Naive Bayes is known to outperform even highly sophisticated classification
methods.
3. Bayes theorem provides a way of calculating posterior probability P(c|x) from P(c), P(x) and P(x|c).
Look at the equation below:
Where
• P(c|x) is the posterior probability of
class (c, target) given predictor (x,
attributes).
• P(c) is the prior probability of class.
• P(x|c) is the likelihood which is the
probability of predictor given class.
• P(x) is the prior probability of
predictor.
Machine Learning Basic Methods: Naive Bayes algorithm
1. Convert the given dataset into frequency tables.
2. Generate Likelihood table by finding the probabilities of given features.
3. Now, use Bayes theorem to calculate the posterior probability.
Example:

Consider raining data set of weather and corresponding target variable ‘Play’
(suggesting possibilities of playing). Now, we need to classify whether players
will play or not based on weather condition.
Machine Learning Basic Methods: Naive Bayes Example

Problem: Players will play if weather is sunny. Is this statement is correct?

solve it using above discussed method of posterior probability.

P(Yes | Sunny) = P( Sunny | Yes) * P(Yes) / P (Sunny)

Here we have
P (Sunny |Yes) = 3/9 = 0.33, P(Sunny) = 5/14 = 0.36, P( Yes)= 9/14 = 0.64

Now, P (Yes | Sunny) = 0.33 * 0.64 / 0.36 = 0.60, which has higher probability.
Machine Learning Basic Methods: Naive Bayes Method

Advantages

1. It is easy and fast to predict class of test data set. It also perform well in multi class prediction
2. When assumption of independence holds, a Naive Bayes classifier performs better compare to
other models like logistic regression and you need less training data.
3. It perform well in case of categorical input variables compared to numerical variable(s). For
numerical variable, normal distribution is assumed (bell curve, which is a strong assumption).

Dis-Advantages

1. If categorical variable has a category (in test data set), which was not observed in training data
set, then model will assign a 0 (zero) probability and will be unable to make a prediction. This
is often known as “Zero Frequency”. To solve this, we can use the smoothing technique. One of
the simplest smoothing techniques is called Laplace estimation.
2. On the other side naive Bayes is also known as a bad estimator, so the probability outputs from
predict_proba are not to be taken too seriously.
3. Another limitation of Naive Bayes is the assumption of independent predictors. In real life, it is
almost impossible that we get a set of predictors which are completely independent.
Thank You

25

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy