Bayesian Classification: Cse 634 Data Mining - Prof. Anita Wasilewska
Bayesian Classification: Cse 634 Data Mining - Prof. Anita Wasilewska
Bayesian Classification: Cse 634 Data Mining - Prof. Anita Wasilewska
● It uses the given values to train a model and then it uses this model
to classify new data
Source : https://www.tutorialspoint.com/data_mining/dm_bayesian_classification.htm
Where is it used?
Source : https://mlcorner.wordpress.com/2013/04/28/bayesian-classifier/
Trying to find the answer
There are only two possible events possible for the given question:
A: It is going to rain tomorrow
B: It will not rain tomorrow.
Source : https://mlcorner.wordpress.com/2013/04/28/bayesian-classifier/
That's too Naive even for Bayes !
● This means that if you recall the previous weather conditions for the last week, and you
remember that it has actually rained every single day, your answer will no longer be 50%
● The Bayesian approach provides a way of explaining how you should change your
existing beliefs in the light of new evidence.
● Bayesian rule’s emphasis on prior probability makes it better suited to be applied in a wide
range of scenarios
Source : https://mlcorner.wordpress.com/2013/04/28/bayesian-classifier/
What is Bayes Theorem?
● Bayes' theorem, named after 18th-century British mathematician
Thomas Bayes, is a mathematical formula for determining conditional
probability
Source : https://www.investopedia.com/terms/b/bayes-theorem.asp
Bayes Formula
● The formula for the Bayes theorem is given as follows:
Source : https://www.investopedia.com/terms/b/bayes-theorem.asp
Small Example
Bayes theorem to the rescue!
P(H|X) = P(X|H) * P(H) / P(X)
H: Hypothesis that Bill will buy the computer X : Bill is 35 years old with fair credit rating and income of 40000$/year
P(H|X) : The probability that Bill will buy the computer GIVEN that we know his age,income and
credit rating [Posterior ]
P(H) : Probability that Bill will buy computer (REGARDLESS of knowing age,income and credit
rating) [Prior]
P(X|H) : Probability that someone is 35 years old, has fair credit rating, earns 40000$/yr AND has
BOUGHT the computer. [Likelihood]
P(X) : Probability that Bill is 35 years old, has fair credit rating, earns 40000$/yr [Evidence]
Big Example
Bill now wants to play football!
(Because he is tired of using his computer)
Source : http://qr.ae/TUTR3L
The Naive Bayes nerd is here!
Source : http://qr.ae/TUTR3L
Lets identify all the factors!
Source : http://qr.ae/TUTR3L
Draw frequency tables for each factor
Source : http://qr.ae/TUTR3L
Find the probability
Source : http://qr.ae/TUTR3L
How to know if results are correct?
The Accuracy of Classification can be found out using a Confusion Matrix
Confusion Matrix
● True Positives (TP): number of positive examples, labeled as such.
Source :https://rasbt.github.io/mlxtend/user_guide/evaluate/confusion_matrix/
Finding accuracy of classification
Accuracy = (TP + TN)/(TP + TN + FP + FN)
Accuracy gives us the result of total correct predictions out of all the predictions
Source :https://tryolabs.com/blog/2013/03/25/why-accuracy-alone-bad-measure-classification-tasks-and-what-we-can-do-about-it/
● The Homo apriorius establishes the probability of an hypothesis, no matter what data tell.
● The Homo pragamiticus establishes that it is interested by the data only.
● The Homo frequentistus measures the probability of the data given the hypothesis.
● The Homo sapients measures the probability of the data and of the hypothesis.
● The Homo bayesianis measures the probability of the hypothesis, given the data.
Source : http://www.brera.mi.astro.it/~andreon/inference/Inference.html
Just because...
http://www3.cs.stonybrook.edu/~cse634/L3ch6classification.pdf
What are Bayesian Classifiers?
● Statistical classifiers.
Reference : https://medium.com/@gp_pulipaka/applying-gaussian-naïve-bayes-classifier-in-python-part-one-9f82aa8d9ec4
Naive Bayes Classification
A Naive Bayes Classifier is a program which predicts a class value given a
set of set of attributes.
For each known class value,
● Calculate probabilities for each attribute, conditional on the class value.
● Use the product rule to obtain a joint conditional probability for the
attributes.
● Use Bayes rule to derive conditional probabilities for the class variable.
Once this has been done for all class values, output the class with the
highest probability.
Model Parameters
For the Bayes classifier, we need to “learn” two functions, the
likelihood and the prior.
Reference :http://blog.datumbox.com/machine-learning-tutorial-the-naive-bayes-text-classifier/
Model Parameters
● Instance Attributes :
Instances are represented as a vector of attributes.
ie, each feature is conditionally independent of every other feature for a particular class
label.
This reduces the number of model parameters.
Bayes Classification
Naive Bayes works equally well for multi valued attributes also.
“Zero” Problem
What if there is a class, Ci and X has an attribute Xk such that none of the
samples in Ci has that attribute value?
Reference: Data Mining Concepts and Techniques 2nd Edition by Jiawei Han and Micheline Kamber, Page 316
Bayesian Belief Networks
● Has Conditional Probability Table (CPT) for each variable
● CPT for a variable Y specifies the conditional distribution
P(Y | Parents(Y)), where Parents(Y) are the parents of Y
● From previous example:
https://www.slideshare.net/ashrafmath/naive-bayes-15644818
Examples of Text classification:
https://www.slideshare.net/ashrafmath/naive-bayes-15644818
Naive Bayes Approach
● Build the vocabulary as the list of all distinct words that appear in all the
documents in the training set.
● Remove stop words and markings
● Words in the vocabulary becomes the attributes, classification is
independent of position of words
● Train the classifier based on the training data set
● Evaluate the results on Test data.
Simple text classifier.
ssh root@dhcp230.fsl.cs.sunysb.edu
Advantages:
● Requires a small amount of training data to estimate the parameters.
● Good results obtained in most cases
● Easy to implement
Disadvantages:
● Practically, dependencies exist among variables.
eg: hospitals : patients: Profile: age, family history etc.
Symptoms: fever, cough etc., Disease: lung cancer, diabetes, etc.
● Dependencies among these cannot be modelled by Naive bayesian
classifier.
Weather prediction:
Recommendation System:
https://www.quora.com/In-what-real-world-applications-is-Naive-Bayes-classifier-used
http://suruchifialoke.com/2017-06-10-sentiment-analysis-movie
https://www.quora.com/In-what-real-world-applications-is-Naive-Bayes-classifier-used
PART 2 Research Paper
Research Paper
Title : Improved Study of Heart Disease Prediction
System using Data Mining Classification Techniques
Authors : Chaitrali S. Dangare & Sulabha S. Apte, PhD
Journal : International Journal of Computer Applications
(0975 – 888) Volume 47– No.10
Publishing Period : June 2012
Abstract
● Bayes Network has been present since time immemorial. For
years, it proved to be a simple yet powerful classifier that could
be used for prediction.
● The computation power required to run a bayesian classifier is
considerately simpler when compared to most of the modern
day classification algorithms.
● This paper debates the use of bayesian classifier along with IDT
and NN and their usage in a Heart Disease Prediction System in
the medical field.
We have used a
cutting edge classifier
to look into your
So,What Does it Say ?
medical report and
analyse you. It is our
new innovation.
Source: bigstock-healthcare-technology-and-med-83203256.jpg
Are You Serious !
That You are Pregnant !
b ( no heart disease ) FP FN
Confusion Matrix
a b a b a b
a b a b a b
15 Attributes
a 100 7 a 85 0 a 106 0
- 15 attributes
Source : https://i.ytimg.com/vi/E7myDAKBgRs/maxresdefault.jpg
Want Medical
YES NO
Records
Accessed?
Source: bigstock-healthcare-technology-and-med-83203256.jpg