0% found this document useful (0 votes)
31 views

2B Naive Bayes

Uploaded by

animehv5500
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

2B Naive Bayes

Uploaded by

animehv5500
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 90

Bayesian Learning

• Bayes Theorem
• Concept Learning
• Bayes Optimal Classifier
• Naïve Bayes Classifier
• Bayesian Belief Network
• EM Algorithm
Concept Learning: A task of acquiring potential hypothesis(solution)
that best fits the given training examples.

Bayes Theorem calculates the probability of each possible hypothesis


and outputs the most probable one.
In other words, the probability of data D given hypothesis h is 1 if D is consistent with h, and 0 otherwise.
Bayes Theorem:
Bayes Theorem:
Now the question arises how to find the conditional probability of attributes w.r.t. the class labels?

Answer is Naïve Bayes Classifier assumes attributes independence. (Because of this assumption
only the word “naïve” is being added with this.)
Example
Naïve Bayes Classifier Example
Bayes Theorem:

This is how Bayes Theorem looks like. To solve real world problems, In dataset D we have multiple attributes, so we
rewrite it in different way.
Naïve Bayes Classifier Algorithm

Where VNB stands for the Naive Bayes classifier’s target value.
Naïve Bayesian Classifier
• Advantages
• Easy to implement
• Good results obtained in most of the cases
• Disadvantages
• Assumption: class conditional independence, therefore loss of accuracy
• Practically, dependencies exist among variables
• E.g., hospitals: patients: Profile: age, family history, etc.
Symptoms: fever, cough etc., Disease: lung cancer, diabetes, etc.
• Dependencies among these cannot be modeled by Naïve Bayesian Classifier
• How to deal with these dependencies?
• Bayesian Belief Networks
Bayesian Belief Networks
• A Bayesian network (or a belief network) is a probabilistic graphical model that represents a set of variables
and their probabilistic independencies.
• Bayesian belief network allows a subset of the variables conditionally independent

• Bayesian Belief Networks is defined by two components

a) A Directed acyclic graph b) A set of conditional Probability Tables

• A graphical model of causal relationships


• Represents dependency among the variables
• Gives a specification of joint probability distribution

❑ Nodes: random variables


❑ Links: dependency
❑ X & Y are the parents of Z, & Y is the parent of P and Z
❑ No dependency between Z and P
❑ Has no loops or cycles
Types of Probabilistic Relationships
Expectation–Maximization (EM) algorithm
The EM algorithm is considered a latent variable model to find the local
maximum likelihood parameters of a statistical model, proposed by Arthur
Dempster, Nan Laird, and Donald Rubin in 1977.

The EM (Expectation-Maximization) algorithm is one of the most commonly


used terms in machine learning to obtain maximum likelihood estimates of
variables that are sometimes observable and sometimes not. However, it is
also applicable to unobserved data or sometimes called latent. It has various
real-world applications in statistics, including obtaining the mode of the
posterior marginal distribution of parameters in machine learning and data
mining applications.
Expectation–Maximization (EM) algorithm
• In statistics, an expectation–maximization (EM) algorithm is
an iterative method to find (local) maximum likelihood or maximum a
posteriori (MAP) estimates of parameters in statistical models, where
the model depends on unobserved latent variables. The EM iteration
alternates between performing an expectation (E) step, which creates
a function for the expectation of the log-likelihood evaluated using
the current estimate for the parameters, and a maximization (M)
step, which computes parameters maximizing the expected log-
likelihood found on the E step. These parameter-estimates are then
used to determine the distribution of the latent variables in the next E
step.
Expectation–Maximization (EM) algorithm
• The Expectation-Maximization (EM) algorithm is defined as the
combination of various unsupervised machine learning algorithms,
which is used to determine the local maximum likelihood estimates
(MLE) or maximum a posteriori estimates (MAP) for unobservable
variables in statistical models. Further, it is a technique to find
maximum likelihood estimation when the latent variables are present.
It is also referred to as the latent variable model.
• A latent variable model consists of both observable and unobservable
variables where observable can be predicted while unobserved are
inferred from the observed variable. These unobservable variables
are known as latent variables.
Expectation–Maximization (EM) Algorithm
• Expectation step (E - step): It involves the estimation (guess) of all
missing values in the dataset so that after completing this step, there
should not be any missing value.
• Maximization step (M - step): This step involves the use of estimated
data in the E-step and updating the parameters.
• Repeat E-step and M-step until the convergence of the values occurs.
Convergence in the EM algorithm?
• Convergence is defined as the specific situation in probability based
on intuition, e.g., if there are two random variables that have very
less difference in their probability, then they are known as converged.
In other words, whenever the values of given variables are matched
with each other, it is called convergence.
Steps in EM Algorithm
SVM(Support Vector Machine )
• Introduction
• Types of Support Vector Kernel( Linear, Polynomial, Gaussian)
• Hyperplane (Decision Surface)
• Properties of SVM
• Issues in SVM
Support Vector Machine ( SVM)
• Support Vector Machine or SVM is one of the most popular Supervised Learning
algorithms, which is used for Classification as well as Regression problems.
However, primarily, it is used for Classification problems in Machine Learning.

• The goal of the SVM algorithm is to create the best line or decision boundary that
can segregate n-dimensional space into classes so that we can easily put the new
data point in the correct category in the future. This best decision boundary is
called a hyperplane.

• SVM chooses the extreme points/vectors that help in creating the hyperplane.
These extreme cases are called as support vectors, and hence algorithm is termed
as Support Vector Machine. Consider the below diagram in which there are two
different categories that are classified using a decision boundary or hyperplane.
Hyperplane and Support Vectors in the SVM
algorithm:
• Hyperplanes are decision boundaries that help classify the data
points. Data points falling on either side of the hyperplane can be
attributed to different classes. It is a subspace whose dimension is
one less than that of its ambient space. If a space is 3-dimensional
then its hyperplanes are the 2-dimensional planes, while if the
space is 2-dimensional, its hyperplanes are the 1-dimensional lines.
There can be multiple lines/decision boundaries to segregate the
classes in n-dimensional space, but we need to find out the best
decision boundary that helps to classify the data points. This best
boundary is known as the hyperplane of SVM.
Hyperplane and Support Vectors in the SVM
algorithm:
The dimensions of the hyperplane depend on the features present in the
dataset, which means if there are 2 features (as shown in image), then
hyperplane will be a straight line. And if there are 3 features, then
hyperplane will be a 2-dimension plane and so on
We always create a hyperplane that has a maximum margin, which means
the maximum distance between the data points. So, key idea behind the
SVM is to maximize the margin.

Support Vectors:
The data points or vectors that are the closest to the hyperplane and which
affect the position of the hyperplane are termed as Support Vector. Since
these vectors support the hyperplane, hence called a Support vector.
Issues in SVM-
SVM algorithm is not suitable for large data sets.
SVM does not perform very well when the data set has more noise i.e. target
classes are overlapping.
In cases where the number of features for each data point exceeds
the number of training data samples, the SVM will underperform.

Support Vector Machine for Multi-Class Problems


To perform SVM on multi-class problems, we can create a binary classifier for
each class of the data. The two results of each classifier will be :
The data point belongs to that class OR The data point does not belong to
that class.
Advantages of SVM
Disadvantages of SVM
Areas where SVM can be applied:
Types of SVM
• Linear SVM: Linear SVM is used for linearly separable data, which
means if a dataset can be classified into two classes by using a single
straight line, then such data is termed as linearly separable data, and
classifier is used called as Linear SVM classifier.
• Non-linear SVM: Non-Linear SVM is used for non-linearly separated
data, which means if a dataset cannot be classified by using a straight
line, then such data is termed as non-linear data and classifier used is
called as Non-linear SVM classifier
SVM for complex (Non Linearly Separable)
• SVM works very well without any modifications for linearly separable
data. Linearly Separable Data is any data that can be plotted in a
graph and can be separated into classes using a straight line
• We use Kernelized SVM for non-linearly separable data. Say,
we have some non-linearly separable data in one dimension. We
can transform this data into two-dimensions and the data will
become linearly separable in two dimensions. This is done by
mapping each 1-D data point to a corresponding 2-D ordered pair. So
for any non-linearly separable data in any dimension, we can just map
the data to a higher dimension and then make it linearly separable.
This is a very powerful and general transformation
Kernel Functions in Support Vector Machine
(SVM)
• Kernel Function is a method used to take data as input and transform
it into the required form of processing data.
• “Kernel” is used due to a set of mathematical functions used in
Support Vector Machine providing the window to manipulate the
data.
• So, Kernel Function generally transforms the training set of data so
that a non-linear decision surface is able to transform to a linear
equation in a higher number of dimension spaces.
Types of Support Vector Kernels
• Linear Kernel: It is used when data is linearly separable.
• Gaussian Kernel: It is used to perform transformation when there is no
prior knowledge about data.
• Gaussian Kernel Radial Basis Function (RBF): Same as gaussian kernel
function, adding radial basis method to improve the transformation.
• Sigmoid Kernel: this function is equivalent to a two-layer, perceptron
model of the neural network, which is used as an activation function for
artificial neurons.
• Polynomial Kernel: It represents the similarity of vectors in the training set
of data in a feature space over polynomials of the original variables used in
the kernel.
Differentiate between Support Vector Machine
and Logistic Regression
• SVM try to maximize the margin between the closest support vectors whereas logistic regression
maximize the posterior class probability

• SVM is deterministic (but we can use Platts model for probability score) while Logistic Regression
is probabilistic.

• For the kernel space, SVM is faster

• Problems that can be solved using SVM are Image Classification, Recognizing handwriting, Cancer
Detection.

• Problems to apply logistic regression algorithm are:


1. Cancer Detection( It can be used to detect if a patient has cancer(1) or not(0)
2. Test Score: Predict if the student is passed(1) or not(0).
3. Marketing: Predict if a customer will purchase a product(1) or not(0).

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy