0% found this document useful (0 votes)
2 views

Module 3- Bayesian Classifier (1)

Bayesian classification is a statistical method that uses Bayes' Theorem to predict class membership probabilities. The naïve Bayesian classifier simplifies the process by assuming conditional independence among attributes, allowing for efficient computation and often yielding comparable performance to other classifiers. However, it may suffer from accuracy loss due to its independence assumption and requires careful handling of zero-probability issues.

Uploaded by

adityasriram701
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Module 3- Bayesian Classifier (1)

Bayesian classification is a statistical method that uses Bayes' Theorem to predict class membership probabilities. The naïve Bayesian classifier simplifies the process by assuming conditional independence among attributes, allowing for efficient computation and often yielding comparable performance to other classifiers. However, it may suffer from accuracy loss due to its independence assumption and requires careful handling of zero-probability issues.

Uploaded by

adityasriram701
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Bayesian Classification

Bayes Theorem - Thomas Bayes


Bayes Theorem - Basics
Bayesian Classification
 A statistical classifier: performs probabilistic prediction,
i.e., predicts class membership probabilities
 Foundation: Based on Bayes’ Theorem.
 Performance: A simple Bayesian classifier, naïve Bayesian
classifier, has comparable performance with decision tree
and selected neural network classifiers
 Incremental: Each training example can incrementally
increase/decrease the probability that a hypothesis is
correct — prior knowledge can be combined with observed
data
 Standard: Even when Bayesian methods are
computationally intractable, they can provide a standard of
optimal decision making against which other methods can
be measured
Bayes’ Theorem: Basics
M
 P(B) 
Total probability Theorem:  P(B | A )P( A )
i i
i 1

 Bayes’ Theorem:P( H | X) P(X | H ) P( H ) P(X | H )P( H ) / P(X)


P(X)
 Let X be a data sample (“evidence”): class label is unknown
 Let H be a hypothesis that X belongs to class C
 Classification is to determine P(H|X), (i.e., posteriori probability):
the probability that the hypothesis holds given the observed
data sample X
 P(H) (prior probability): the initial probability
 E.g., X will buy computer, regardless of age, income, …
 P(X): probability that sample data is observed
 P(X|H) (likelihood): the probability of observing the sample X,
given that the hypothesis holds
 E.g., Given that X will buy computer, the prob. that X is

5 31..40, medium income


Prediction Based on Bayes’ Theorem

 Given training data X, posteriori probability of a


hypothesis H, P(H|X), follows the Bayes’ theorem

P(H | X) P(X | H ) P(H ) P(X | H )P(H ) / P(X)


P(X)
 Informally, this can be viewed as
posteriori = likelihood x prior/evidence
 Predicts X belongs to Ci iff the probability P(Ci|X) is the
highest among all the P(Ck|X) for all the k classes
 Practical difficulty: It requires initial knowledge of
many probabilities, involving significant computational
cost
6
Naïve Bayes Classification
 Let D be a training set of tuples and their associated
class labels, and each tuple is represented by an n-D
attribute vector X = (x1, x2, …, xn)
 Suppose there are m classes C1, C2, …, Cm.
 Given a tuple, X, the classifier will predict that X
belongs to the class having the highest posterior
probability, conditioned on X. That is, the naıve
Bayesian classifier predicts that tuple X belongs to the
class Ci if and only if

7
Classification Is to Derive the
Maximum Posteriori
 Classification is to derive the maximum
posteriori, i.e., the maximal P(Ci|X) This
can be derived from Bayes’ theorem Since
P(X) is constant for all classes, only P (X|
Ci)P(Ci) needs to be maximized.
P(X | C )P(C )
P(C | X)  i i P(C | X) P(X | C )P(C )
i P(X) i i i
Naïve Bayes Classifier
 A simplified assumption: attributes are conditionally
independent (i.e., no dependence relation between
attributes):
n
P ( X | C i )   P ( x | C i ) P ( x | C i ) P ( x | C i ) ... P ( x | C i )
k 1 2 n
k 1

 This greatly reduces the computation cost: Only


counts the class distribution
 If Ak is categorical, P(xk|Ci) is the number of tuples in
Ci having value xk for Ak divided by |Ci, D| (number of
tuples of Ci in D)
 If Ak is continous-valued, P(xk|C 1 i) is
 usually computed
2
( x  )

( x,  ,  ) 
2
2
based on Gaussian g distribution 2 
ewith a mean μ and
standard deviation σ
P ( X | C i )  g ( xk ,  C ,  C )
and P(xk|Ci) is
i i

9
How to Predict a class label using naıve
Bayesian classification?

 Given class labeled training tuples from


AllElectronics Customer Database.

 The data tuples are described by the


attributes age, income, student, and
credit rating.
Naïve Bayes Classifier: Training Dataset

Class: C1:buys_computer = ‘yes’ & C2:buys_computer = ‘no’

11
 P(Ci): P(buys_computer = “yes”) = 9/14
= 0.643
age
P(buys_computer
income studentcredit_rating
= “no”) =
buys_computer
5/14=<=30
0.357
<=30
high
high
no
no
fair
excellent
no
no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no

The tuple we wish to classify is


X = (youth , income = medium, student
=yes, credit_rating = fair)
age income studentcredit_rating
buys_computer
<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes

Naïve Bayes Classifier: An Example >40


>40
medium
low
no fair
yes fair
yes
yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
 Compute P(X|Ci) for each class >40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no

P(age = “youth” | buys_computer = “yes”) = 2/9 = 0.222


P(age = “youth” | buys_computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys_computer = “yes”) = 4/9 =
0.444
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 =
0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 =
0.4

 X = (youth , income = medium, student = yes, credit_rating =


fair)
13
Conditio
n
satisfied

Therefore the Naïve Bayesian Classifier predicts


buys_computer = yes for tuple X
Avoiding the Zero-Probability Problem
 Naïve Bayesian prediction requires each conditional
prob. be non-zero. Otherwise, the predicted prob.
will be zero n
P( X | C i)   P( x k | C i)
k 1

 Ex. Suppose a dataset with 1000 tuples, income=low


(0), income= medium (990), and income = high (10)
 Use Laplacian correction (or Laplacian estimator)
 Adding 1 to each case
Prob(income = low) = 1/1003
Prob(income = medium) = 991/1003
Prob(income = high) = 11/1003
 The “corrected” prob. estimates are close to their
15“uncorrected” counterparts
Naïve Bayes Classifier: Comments
 Advantages
 Easy to implement
 Good results obtained in most of the cases
 Disadvantages
 Assumption: class conditional independence,
therefore loss of accuracy
 Practically, dependencies exist among variables
 E.g., hospitals: patients: Profile: age, family history, etc.
Symptoms: fever, cough etc., Disease: lung
cancer, diabetes, etc.
 Dependencies among these cannot be modeled by Naïve
Bayes Classifier

16
Thank you….

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy