0% found this document useful (0 votes)

11 views68 pages

Multimedia Application L8

The document discusses Naive Bayes classifiers, including their types, training methods, and applications in text classification tasks such as sentiment analysis and spam detection. It explains Bayes' theorem, the bag-of-words representation, and the importance of feature selection in classification. Additionally, it covers optimization techniques and the relationship of Naive Bayes to language modeling.

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views68 pages

Multimedia Application L8

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 68

Multimedia

Application
By

Minhaz Uddin Ahmed, PhD

Department of Computer Engineering
Inha University Tashkent.
Email: minhaz.ahmed@gmail.com
Content
 Naive Bayes Classifiers
 Training the Naive Bayes Classifier
 Worked example
 Optimizing for Sentiment Analysis
 Naive Bayes for other text classification tasks
 Naive Bayes as a Language Model
Bayes theorem

 Bayes’ theorem is also known as Bayes’ Rule or Bayes’ law, which

is used to determine the probability of a hypothesis with prior
knowledge. It depends on the conditional probability

P(A|B) is Posterior probability: Probability of hypothesis A on the

observed event B.
P(B|A) is Likelihood probability: Probability of the evidence given
that the probability of a hypothesis is true.
P(A) is Prior Probability: Probability of hypothesis before observing
the evidence.
P(B) is Marginal Probability: Probability of Evidence.
Types Of Naive Bayes:

 There are three types of Naive Bayes model under the scikit-learn
library:
Gaussian: It is used in classification and it assumes that features
follow a normal distribution.
Multinomial: It is used for discrete counts. For example, let’s say, we
have a text classification problem. Here we can consider Bernoulli trials
which is one step further and instead of “word occurring in the document”,
we have “count how often word occurs in the document”, you can think of
it as “number of times outcome number x_i is observed over the n trials”.
Bernoulli: The binomial model is useful if your feature vectors are
binary. One application would be text classification with ‘bag of words’
model where the 1s & 0s are “word occurs in the document” and “word
does not occur in the document” respectively.
Example
Example solution

 Assigning subject categories, topics, or genres

 Spam detection
 Authorship identification
 Age/gender identification
 Language Identification
 Sentiment analysis
Who wrote which Federalist papers?

 1787-8: anonymous essays try to convince New York to ratify

U.S Constitution: Jay, Madison, Hamilton.
 Authorship of 12 of the letters in dispute
 1963: solved by Mosteller and Wallace using Bayesian
methods

James Madison Alexander Hamilton

Male or female author from a given
text
By 1925 present-day Vietnam was divided into three parts under French colonial
rule.
The southern region embracing Saigon and the Mekong delta was the colony of
Cochin-China;
the central area with its imperial capital at Hue was the protectorate of Annam …

Clara never failed to be astonished by the extraordinary felicity of her own name.
She found it hard to trust herself to the mercy of fate, which had managed over
the years
to convert her greatest shame into one of her greatest assets…
Text Classification: definition

 Input:
 a document d
 a fixed set of classes C = {c1, c2,…, cJ}

 Output: a predicted class c  C

Classification Methods:
Hand-coded rules
 Rules based on combinations of words or other features
 spam: black-list-address OR (“dollars” AND“have been selected”)
 Accuracy can be high
 If rules carefully refined by expert
 But building and maintaining these rules is expensive
Classification Methods:
Supervised Machine Learning
 Input:
 a document d

 a fixed set of classes C = {c1, c2,…, cJ}

 A training set of m hand-labeled documents (d1,c1),....,(dm,cm)

 Output:
 a learned classifier γ:d  c
Classification Methods:
Supervised Machine Learning
 Any kind of classifier

Naïve Bayes
Logistic regression
Support-vector machines
k-Nearest Neighbors
Naive Bayes Intuition

 Simple ("naive") classification method based on Bayes rule

 Relies on very simple representation of document
 Bag of words
The Bag of Words Representation

We preprocess the dataset by converting each email into a bag-

of-words representation, where each word is a feature and its
frequency in the email is its value. We also assign a label (spam
or not spam) to each email
The Bag of Words Representation
The bag of words representation

seen 2
sweet 1

γ whimsical
recommend
happy
1
1
1
)=c
( ... ...
Training

 We train the Naïve Bayes classifier on the labeled dataset. During

training, the classifier calculates the probabilities of each word
occurring in spam and not spam emails, as well as the prior
probabilities of spam and not spam emails in the dataset.
Prediction

Step 1: Given a new email, we convert it into a bag-of-words

representation.
Step 2: For each word in the email, we calculate its conditional
probability of occurring in spam and not spam emails based on the
probabilities learned during training.
Step 3: We multiply the conditional probabilities of all words in the
email and multiply them by the prior probabilities of spam and not spam
emails.
Step 4: We compare the calculated probabilities for spam and not spam,
and classify the email as spam or not spam based on the higher
probability.
Bayes’ Rule Applied to Documents
and Classes
 For a document d and a class c
Naive Bayes Classifier (I)

MAP is “maximum a
posteriori” = most
likely class

Bayes Rule

Dropping the
denominator
Learning the Multinomial Naive Bayes Model

 First attempt: maximum likelihood estimates

 simply use the frequencies in the data

𝑁𝑐
^ (𝑐 )=
𝑃 𝑗
𝑗
𝑁 𝑡𝑜𝑡𝑎𝑙
Parameter estimation

fraction of times word wi appears

among all words in documents of topic cj

 Create mega-document for topic j by concatenating all docs in this

topic
 Use frequency of w in mega-document
Problem with Maximum Likelihood

 What if we have seen no training documents with the word

fantastic and classified in the topic positive (thumbs-up)?

 Zero probabilities cannot be conditioned away, no matter the

other evidence!
Laplace (add-1) smoothing for Naïve
Bayes
Multinomial Naïve Bayes: Learning

 From training corpus, extract Vocabulary

 Calculate P(cj) terms • Calculate P(wk | cj) terms

 For each cj in C do • Textj  single doc containing all docsj
docsj  all docs with class =cj
• For each word wk in Vocabulary
nk  # of occurrences of wk in Textj
Unknown words
 What about unknown words
 that appear in our test data
 but not in our training data or vocabulary?
 We ignore them
 Remove them from the test document!
 Pretend they weren't there!
 Don't include any probability for them at all!
 Why don't we build an unknown word model?
 It doesn't help: knowing which class has more unknown
words is not generally helpful!
Stop words

 Some systems ignore stop words

 Stop words: very frequent words like the and a.
 Sort the vocabulary by word frequency in training set
 Call the top 10 or 50 words the stopword list.
 Remove all stop words from both training and test sets
 As if they were never there!
 But removing stop words doesn't usually help
• So in practice most NB algorithms use all words and don't use
stopword lists
Naive Bayes: Learning

Sentiment
Example:
A worked sentiment example with
add-1 smoothing
1. Prior from training:
^ (𝑐 )=
𝑃 𝑗
𝑁𝑐 𝑗 P(-) = 3/5
𝑁 𝑡𝑜𝑡𝑎𝑙
P(+) = 2/5
2. Drop "with"
3. Likelihoods from training:
𝑐𝑜𝑢𝑛𝑡 ( 𝑤 𝑖 , 𝑐 ) +1
𝑝 ( 𝑤 𝑖|𝑐 ) =
(∑ )
𝑐𝑜𝑢𝑛𝑡 (𝑤 ,𝑐 ) + ¿ 𝑉 ∨¿ ¿ 4. Scoring the test set:
𝑤 ∈𝑉
Optimizing for sentiment analysis

For tasks like sentiment, word occurrence seems to

be more important than word frequency.
The occurrence of the word fantastic tells us a
lot
The fact that it occurs 5 times may not tell us
much more.
Binary multinominal naive bayes, or binary NB
 Clip our word counts at 1
Binary Multinomial Naïve Bayes:
Learning

• Calculate P(wk | cj) terms

• From training corpus, extract Vocabulary • Remove duplicates in each doc:
• For each word type w in docj
 Calculate P(cj) terms • Retain only a single instance of w
 For each cj in C do • Textj  single doc containing all docsj
docsj  all docs with class =cj • For each word wk in Vocabulary
nk  # of occurrences of wk in Textj
Binary Multinomial Naive Bayes
on a test document d
First remove all duplicate words from d
Then compute NB using the same equation
Binary multinominal naive Bayes
Binary multinominal naive Bayes

Counts can still be 2! Binarization is within-doc!

 I really like this movie

I really don't like this movie

Negation changes the meaning of "like" to negative.

Negation can also change negative to positive-ish
◦ Don't dismiss this film
◦ Doesn't let us get bored
Sentiment Classification: Lexicons

Sometimes we don't have enough labeled training data

In that case, we can make use of pre-built word lists
Called lexicons
There are various publicly available lexicons
MPQA Subjectivity Cues Lexicon

Theresa Wilson, Janyce Wiebe, and Paul Hoffmann (2005). Recognizing Contextual Polarity in
Phrase-Level Sentiment Analysis. Proc. of HLT-EMNLP-2005.

Riloff and Wiebe (2003). Learning extraction patterns for subjective expressions. EMNLP-2003.

 Home page: https://mpqa.cs.pitt.edu/lexicons/subj_lexicon/

 6885 words from 8221 lemmas, annotated for intensity (strong/weak)
 2718 positive
 4912 negative
 + : admirable, beautiful, confident, dazzling, ecstatic, favor, glee, great
 − : awful, bad, bias, catastrophe, cheat, deny, envious, foul, harsh,
hate
Using Lexicons in Sentiment
Classification
Add a feature that gets a count whenever a word
from the lexicon occurs
 E.g., a feature called "this word occurs in the positive
lexicon" or "this word occurs in the negative lexicon"
Now all positive words (good, great, beautiful,
wonderful) or negative words count for that feature.
Using 1-2 features isn't as good as using all the words.
• But when training data is sparse or not representative of
the test set, dense lexicon features can help
Naive Bayes in Other tasks: Spam
Filtering
 Spam Assassin Features:
 Mentions millions of (dollar) ((dollar) NN,NNN,NNN.NN)
 From: starts with many numbers
 Subject is all capitals
 HTML has a low ratio of text to image area
 "One hundred percent guaranteed"
 Claims you can be removed from the list
Naive Bayes in Language ID

 Determining what language a piece of text is

written in.
Features based on character n-grams do very well
 Important to train on lots of varieties of each
language
(e.g., American English varieties like African-American English,
or English varieties around the world like Indian English)
Summary: Naive Bayes is Not So
Naive
 Very Fast, low storage requirements

 Work well with very small amounts of training data

 Robust to Irrelevant Features

Irrelevant Features cancel each other without affecting results

 Very good in domains with many equally important features

Decision Trees suffer from fragmentation in such cases – especially if little

data
If assumed independence is
 Optimal if the independence assumptions hold:

correct, then it is the Bayes Optimal Classifier for problem

 A good dependable baseline for text classification
Naïve Bayes: Relationship to Language
Modeling

 Generative Model for Multinomial Naïve Bayes

c=China

X1=Shanghai X2=and X3=Shenzhen X4=issue X5=bonds

Naïve Bayes and Language
Modeling
 Naïvebayes classifiers can use any sort of
feature
 URL, email address, dictionaries, network features
 But if, as in the previous slides
 We use only word features
 we use all of the words in the text (not a subset)
 Then
 Naïve
bayes has an important similarity to language
modeling.
Each class = a unigram language model

 Assigning each word: P(word | c)

 Assigning each sentence: P(s|c)=Π P(word|c)

Class pos
0.1 I I love this fun film
0.1 love
0.1 0.1 .05 0.01 0.1
0.01 this
0.05 fun
0.1 film P(s | pos) = 0.0000005
…
Naïve Bayes: Relationship to Language
Modeling

 Precision,
 Recall,
 and F1

Evaluating Classifiers: How well does our classifier

work?
Evaluating Classifiers: How well
does our classifier work?
Let's first address binary classifiers:
• Is this email spam?
spam (+) or not spam (-)
• Is this post about Delicious Pie Company?
about Del. Pie Co (+) or not about Del. Pie Co(-)

We'll need to know

1. What did our classifier say about each email or post?
2. What should our classifier have said, i.e., the correct
answer, usually as defined by humans ("gold label")
Evaluating Classifiers: How well
does our classifier work?
 Let's first consider binary classifiers. E.g. spam or not-spam. Or
imagine that we are the proprietors of the Delicious Pie Company and
we want to find out what people are saying about our pies on social
media. We want to know if a particular social media post is talking
about our pies positive or negative.

To evaluate such a binary classifier, we'll need to know two things.

What our classifier said about each email or post, and what it should
have said, i.e. the correct answer, usually as defined by human
labelers.
First step in evaluation: The
confusion matrix
First step in evaluation: The
confusion matrix
The first step is the confusion matrix a table for visualizing how an
algorithm performs with respect to the human gold labels. here use two
dimensions (system output and gold labels), and each cell labels a set of
possible outcomes.

In the pie detection case, for example, true positives are posts that are
indeed about Delicious Pie (indicated by human-created gold labels) that
our system correctly said were about pie. False negatives are posts that
are indeed about pie but our system incorrectly labeled as pie. False
positives are posts that aren't about pie but our system incorrectly said
they were. And true negatives are non-pie-posts that are system
correctly said were not about pie
Accuracy on the confusion matrix
Accuracy on the confusion matrix

Here is the equation for accuracy: what percentage of all the

observations (for the spam or pie examples that means all emails
or tweets) our system labeled correctly.

Although accuracy might seem a natural metric, we generally

don’t use it for text classification tasks.
Why don't we use accuracy?

Accuracy doesn't work well when we're dealing

with uncommon or imbalanced classes
Suppose we look at 1,000,000 social media posts
to find Delicious Pie-lovers (or haters)
• 100 of them talk about our pie
• 999,900 are posts about something unrelated
Imagine the following simple classifier
Every post is "not about pie"
Accuracy re: pie posts100 posts are about pie; 999,900 aren't
Why don't we use accuracy?

Accuracy of our "nothing is pie" classifier

999,900 true negatives and 100 false negatives
Accuracy is 999,900/1,000,000 = 99.99%!
But useless at finding pie-lovers (or haters)!!
Which was our goal!
Accuracy doesn't work well for unbalanced classes
Most tweets are not about pie!
Why don't we use accuracy?

 But this fabulous ‘no pie’ classifier would be completely useless, since
it wouldn’t find a single one of the customer comments we are
looking for. In other words, accuracy is not a good metric when the
goal is to discover something that is rare, or at least not completely
balanced in frequency, which is a very common situation in the world.
Instead of accuracy we use
precision and recall

Precision: % of selected items that are correct

Recall: % of correct items that are selected
Precision and Recall

 Precision is out of the things the system selected (the set of emails
or tweets the system claimed were positive, i.e. spam or pie-related),
how many did it get right? how many were true positives, out of what
I selected (true positives _ false positives).

Recall is out of all the correct items that should have been positive,
what % of them did the system select? So out of all the things that
are gold positive, how many did the system find as true positives?

So precision is about how much garbage we included in our findings;

recall is more about making sure we didn't miss any treasure.
Precision and Recall

• 100 tweets talk about pie, 999,900 tweets don't

• Accuracy = 999,900/1,000,000 = 99.99%
But the Recall and Precision for this classifier are terrible:
Precision and Recall

 Recall and Precision will correctly evaluate our stupid

"just say no" classifier as a bad classifier. The recall
will be 0, since we returned no true positives out of
the 100 true pie tweets (0 + 100). Precision is
similarly 0 or in fact undefined, since both the
numerator and denominator are 0. The metrics
correctly assigns bad scores to our useless classifier.
 [to get high precision, a system should be very reluctant to guess – but then it
may miss somethings and have poor recall]
 [to get high recall, a system should be very willing to guess– but then it may return
some junk and have poor precision]
A combined measure: F1

 F1 is a combination of precision and recall.

F1 turns out to be the harmonic mean between precision and recall

F1 is a special case of the general
"F-measure"
F-measure is the (weighted) harmonic mean of precision and recall

F1 is a special case of F-measure with β=1, α=½

F1 is a special case of the general
"F-measure"
 F1 is a special case of the F-measure: weighted harmonic mean of precision and recall.
 The harmonic mean of a set of numbers is the reciprocal of the arithmetic mean of
reciprocals.

 You can see here that F score is the harmonic mean, if we replace alpha with ½ we get 2/
(1/p + 1/r).
 The Harmonic mean of two values is closer to the minimum of the two numbers than
arithmetic or geometric mean, so it weighs the lower of the two numbers more heavily.

That is, if P and R are far apart, F will be nearer the lower value, which makes it a kind of conservative
mean in this situation. Thus to do well on F1, you have to do well on BOTH P and R.

Why the weights? in some applications you may care more about P or R. In practice we
mainly use the balanced measure with beta = 1 and alpha =1/2
Suppose we have more than 2
classes
 Lots of text classification tasks have more than two classes.
 Sentiment analysis (positive, negative, neutral) , named entities
(person, location, organization)
 We can define precision and recall for multiple classes like this 3-way
email task:
Suppose we have more than 2
classes
 Lots of classification tasks have more than two classes, like sentiment
could be 3-way. Consider the confusion matrix for a hypothetical 3-
way email categorization decision (urgent, normal, spam). Notice that
the system mistakenly labeled one spam document as urgent. We can
compute distinct precision and recall values for each class. For
example, the precision of the urgent category is 8 (the true positive
urgent) over the true positives + false positives (the 10 normal and
that 1 spam). The result, however, is 3 separate precision values and
3 separate recall values!
Reference

Chapter 4
Question
Thank you

NLP - PPT - Module 3 - Naïve Bayes, Text Classification and Sentiment
100% (1)
NLP - PPT - Module 3 - Naïve Bayes, Text Classification and Sentiment
86 pages
BCP Capstone Project Guidelines Updated 110623
No ratings yet
BCP Capstone Project Guidelines Updated 110623
72 pages
Lecture 8-1 - Text Classification, Naïve Bayes, Vector Space Classification
No ratings yet
Lecture 8-1 - Text Classification, Naïve Bayes, Vector Space Classification
38 pages
The Secret History Cosmos, History, P... (Z-Library)
No ratings yet
The Secret History Cosmos, History, P... (Z-Library)
309 pages
The Posy Book
No ratings yet
The Posy Book
340 pages
ML Unit No.4 Naïve Bayes Classifiers PPT Notes
No ratings yet
ML Unit No.4 Naïve Bayes Classifiers PPT Notes
47 pages
DLP FINAL DEMO Sensory Images
No ratings yet
DLP FINAL DEMO Sensory Images
9 pages
Baye's Notes
No ratings yet
Baye's Notes
3 pages
CS464 Chapter 4: Naïve Bayes: (Slides Based On The Slides Provided by Öznur Taştan and Mehmet Koyutürk)
No ratings yet
CS464 Chapter 4: Naïve Bayes: (Slides Based On The Slides Provided by Öznur Taştan and Mehmet Koyutürk)
55 pages
4 Naive Bayes
No ratings yet
4 Naive Bayes
82 pages
NB 24 Aug
No ratings yet
NB 24 Aug
82 pages
05 Text Classification - Naive Bayes
No ratings yet
05 Text Classification - Naive Bayes
64 pages
05 Text Classification - Naive Bayes
No ratings yet
05 Text Classification - Naive Bayes
64 pages
In4080 2022 Lecture 03
No ratings yet
In4080 2022 Lecture 03
62 pages
4 NB 2024
No ratings yet
4 NB 2024
82 pages
Module 3 NLP
No ratings yet
Module 3 NLP
17 pages
Top Machine Learning Informations About Different Algorithms
No ratings yet
Top Machine Learning Informations About Different Algorithms
63 pages
Lecture13 Nbayes
No ratings yet
Lecture13 Nbayes
56 pages
Text Classification
No ratings yet
Text Classification
53 pages
T4L1 Naive Bayes
No ratings yet
T4L1 Naive Bayes
50 pages
Lecture 5-1 Naive
No ratings yet
Lecture 5-1 Naive
44 pages
L5 TextClassification Updated
No ratings yet
L5 TextClassification Updated
179 pages
NB 24 Aug
No ratings yet
NB 24 Aug
85 pages
Text Classification Using TF-IDF and Machine Learning
No ratings yet
Text Classification Using TF-IDF and Machine Learning
30 pages
Lecture03 Naive Bayes
No ratings yet
Lecture03 Naive Bayes
33 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
48 pages
NB 24 Aug
No ratings yet
NB 24 Aug
79 pages
7 - Text Classification Naive Bayes
No ratings yet
7 - Text Classification Naive Bayes
41 pages
The Vocabulary-Learning Strategies PDF
No ratings yet
The Vocabulary-Learning Strategies PDF
35 pages
Lecture 02
No ratings yet
Lecture 02
31 pages
Bag - of - Words NLP
No ratings yet
Bag - of - Words NLP
23 pages
2022 Slide9 BayesML Eng
No ratings yet
2022 Slide9 BayesML Eng
34 pages
BAI601 Module 3 PDF
No ratings yet
BAI601 Module 3 PDF
19 pages
Lecture03 Naivebayes
No ratings yet
Lecture03 Naivebayes
25 pages
Multimedia Application L9
No ratings yet
Multimedia Application L9
43 pages
Lecture 6 Morphology
No ratings yet
Lecture 6 Morphology
66 pages
Naive Bayes With Sentiment Classification
No ratings yet
Naive Bayes With Sentiment Classification
82 pages
Block Cipher
No ratings yet
Block Cipher
17 pages
Multimedia Application L2
No ratings yet
Multimedia Application L2
47 pages
PT 1-Covenant Summary of Biblical Covenants
No ratings yet
PT 1-Covenant Summary of Biblical Covenants
16 pages
Basilica Di Santa Maria Maggiore
No ratings yet
Basilica Di Santa Maria Maggiore
15 pages
LM3 - Naive Bayes Model
No ratings yet
LM3 - Naive Bayes Model
21 pages
Naive Bayes and Sentiment Classification: CS6431 Natural Language Processing Spring 2023
No ratings yet
Naive Bayes and Sentiment Classification: CS6431 Natural Language Processing Spring 2023
36 pages
Ca 12
No ratings yet
Ca 12
64 pages
Naivebayes 2021
No ratings yet
Naivebayes 2021
77 pages
Multimedia Application L7 - For
No ratings yet
Multimedia Application L7 - For
46 pages
Statistics
No ratings yet
Statistics
25 pages
24 Shivangi DMDW
No ratings yet
24 Shivangi DMDW
12 pages
Prayer Journals Keywords Helium 10
No ratings yet
Prayer Journals Keywords Helium 10
2 pages
Vehicle Communication System Using Li-Fi Technology PDF
No ratings yet
Vehicle Communication System Using Li-Fi Technology PDF
7 pages
NLP NB
No ratings yet
NLP NB
52 pages
Multinomial NB
No ratings yet
Multinomial NB
52 pages
CATALOGO Mirta Kupferminc
No ratings yet
CATALOGO Mirta Kupferminc
41 pages
Naive Bayes
No ratings yet
Naive Bayes
56 pages
Features of A Script Presentation
No ratings yet
Features of A Script Presentation
8 pages
Lecture 12 Dr. Lamiaa
No ratings yet
Lecture 12 Dr. Lamiaa
21 pages
Sentiment Analysis: Using Naïve Bayes Classifier
No ratings yet
Sentiment Analysis: Using Naïve Bayes Classifier
18 pages
Euro International School
No ratings yet
Euro International School
23 pages
Multimedia Application L4
No ratings yet
Multimedia Application L4
42 pages
Messiah Ben Joseph in The Book of Psalms
No ratings yet
Messiah Ben Joseph in The Book of Psalms
20 pages
Syllabus Intro To Prose ENGL226 Concordia
No ratings yet
Syllabus Intro To Prose ENGL226 Concordia
4 pages
Resentation On Aïve Bayesian Lassification
No ratings yet
Resentation On Aïve Bayesian Lassification
38 pages
NaiveBayes N Text Analytics
No ratings yet
NaiveBayes N Text Analytics
20 pages
04 Textcat
No ratings yet
04 Textcat
101 pages
Ashley Jas1956
No ratings yet
Ashley Jas1956
10 pages
Multimedia Application L6
No ratings yet
Multimedia Application L6
63 pages
J Flange PDF
No ratings yet
J Flange PDF
28 pages
Week 2
No ratings yet
Week 2
31 pages
Irs Unit 4 CH 1
No ratings yet
Irs Unit 4 CH 1
58 pages
Tackling The Poor Assumptions of Naive Bayes Text Classifiers
No ratings yet
Tackling The Poor Assumptions of Naive Bayes Text Classifiers
8 pages
COMP 6651 Midterm F24 Front Page and Last Page
No ratings yet
COMP 6651 Midterm F24 Front Page and Last Page
2 pages
05 Naive Bayes - Relationship To Language Modeling 4-35
No ratings yet
05 Naive Bayes - Relationship To Language Modeling 4-35
2 pages
Naive Bayes
No ratings yet
Naive Bayes
3 pages
Naive Bayes Classifier Presentation
No ratings yet
Naive Bayes Classifier Presentation
10 pages
Naive Bayes
No ratings yet
Naive Bayes
12 pages
2024 Ssce Int Candidates Entry Schedule
No ratings yet
2024 Ssce Int Candidates Entry Schedule
2 pages
Text Classification
No ratings yet
Text Classification
7 pages
Bisection Method (11 Files Merged)
No ratings yet
Bisection Method (11 Files Merged)
52 pages
ML CLassification Naive Bayes
No ratings yet
ML CLassification Naive Bayes
6 pages
Annex C-4 - COT Rating Sheet For Highly Proficient Teacher For SY 2024-2025
No ratings yet
Annex C-4 - COT Rating Sheet For Highly Proficient Teacher For SY 2024-2025
1 page
Naive Bayes and Sentiment
No ratings yet
Naive Bayes and Sentiment
19 pages
Summaries of Different Types of Passages
No ratings yet
Summaries of Different Types of Passages
30 pages
Listening: Listen and Number
No ratings yet
Listening: Listen and Number
4 pages
Solution To CS243 Assignment3: 1. Text (Sipser, Second Edition) Chapter 2 (p.128) 2.4 2.4b
No ratings yet
Solution To CS243 Assignment3: 1. Text (Sipser, Second Edition) Chapter 2 (p.128) 2.4 2.4b
4 pages
Plot Diagram Template 01
No ratings yet
Plot Diagram Template 01
1 page
An Approach of The Naive Bayes Classifier For The Document Classification
No ratings yet
An Approach of The Naive Bayes Classifier For The Document Classification
4 pages
The Fathers of The Church. A New Translation. Volume 07.
100% (2)
The Fathers of The Church. A New Translation. Volume 07.
464 pages
Na Ive Bayes Classifier
No ratings yet
Na Ive Bayes Classifier
3 pages
XI Speaking Test 2081
No ratings yet
XI Speaking Test 2081
4 pages
Bayesian Learning
No ratings yet
Bayesian Learning
49 pages
Leo Tolstoy S Death 00 Palm
No ratings yet
Leo Tolstoy S Death 00 Palm
28 pages
Naive Bayes
No ratings yet
Naive Bayes
4 pages
AI Lec 04+05 - Naive Bayes
No ratings yet
AI Lec 04+05 - Naive Bayes
55 pages
Inf2b Learn Note07 2up
No ratings yet
Inf2b Learn Note07 2up
5 pages
04 A Nap For Zap AP
100% (2)
04 A Nap For Zap AP
17 pages
Naive Bayes Classifier: Fundamentals and Applications
From Everand
Naive Bayes Classifier: Fundamentals and Applications
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Multimedia Application L8

Uploaded by

Multimedia Application L8

Uploaded by

Multimedia

Minhaz Uddin Ahmed, PhD

 Bayes’ theorem is also known as Bayes’ Rule or Bayes’ law, which

P(A|B) is Posterior probability: Probability of hypothesis A on the

 Assigning subject categories, topics, or genres

 1787-8: anonymous essays try to convince New York to ratify

James Madison Alexander Hamilton

 Output: a predicted class c  C

 a fixed set of classes C = {c1, c2,…, cJ}

 Simple ("naive") classification method based on Bayes rule

We preprocess the dataset by converting each email into a bag-

 We train the Naïve Bayes classifier on the labeled dataset. During

Step 1: Given a new email, we convert it into a bag-of-words

 First attempt: maximum likelihood estimates

fraction of times word wi appears

 Create mega-document for topic j by concatenating all docs in this

 What if we have seen no training documents with the word

 Zero probabilities cannot be conditioned away, no matter the

 From training corpus, extract Vocabulary

 Calculate P(cj) terms • Calculate P(wk | cj) terms

 Some systems ignore stop words

For tasks like sentiment, word occurrence seems to

• Calculate P(wk | cj) terms

Counts can still be 2! Binarization is within-doc!

 I really like this movie

I really don't like this movie

Negation changes the meaning of "like" to negative.

Sometimes we don't have enough labeled training data

 Home page: https://mpqa.cs.pitt.edu/lexicons/subj_lexicon/

 Determining what language a piece of text is

 Work well with very small amounts of training data

 Robust to Irrelevant Features

Irrelevant Features cancel each other without affecting results

Decision Trees suffer from fragmentation in such cases – especially if little

correct, then it is the Bayes Optimal Classifier for problem

 Generative Model for Multinomial Naïve Bayes

X1=Shanghai X2=and X3=Shenzhen X4=issue X5=bonds

 Assigning each word: P(word | c)

Evaluating Classifiers: How well does our classifier

We'll need to know

To evaluate such a binary classifier, we'll need to know two things.

Here is the equation for accuracy: what percentage of all the

Although accuracy might seem a natural metric, we generally

Accuracy doesn't work well when we're dealing

Accuracy of our "nothing is pie" classifier

Precision: % of selected items that are correct

So precision is about how much garbage we included in our findings;

• 100 tweets talk about pie, 999,900 tweets don't

 Recall and Precision will correctly evaluate our stupid

 F1 is a combination of precision and recall.

F1 turns out to be the harmonic mean between precision and recall

F1 is a special case of F-measure with β=1, α=½

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.