Naive Bayes With Sentiment Classification
Naive Bayes With Sentiment Classification
Classification
and Naïve
Bayes
Is this spam?
Dan Jurafsky
4
Dan Jurafsky
Text Classification
• Assigning subject categories, topics, or genres
• Spam detection
• Authorship identification
• Age/gender identification
• Language Identification
• Sentiment analysis
• …
Dan Jurafsky
Hand-coded rules
• Rules based on combinations of words or other features
• spam: black-list-address OR (“dollars” AND“have been selected”)
• Accuracy can be high
• If rules carefully refined by expert
• But building and maintaining these rules is expensive
Dan Jurafsky
Classification Methods:
Supervised Machine Learning
• Input:
• a document d
• a fixed set of classes C = {c1, c2,…, cJ}
• A training set of m hand-labeled documents (d1,c1),....,(dm,cm)
• Output:
• a learned classifier γ:d c
9
Dan Jurafsky
Classification Methods:
Supervised Machine Learning
• Any kind of classifier
• Naïve Bayes
• Logistic regression
• Support-vector machines
• k-Nearest Neighbors
•…
Text
Classificat The Naive Bayes Classifier
ion and
Naive
Bayes
Naive Bayes Intuition
13
The bag of words
representation
seen 2
γ
sweet 1
whimsical
recommend
1
1 )=c
(
happy 1
... ...
Bayes’ Rule Applied to Documents and Classes
MAP is “maximum a
posteriori” = most
likely class
Bayes Rule
Dropping the
denominator
Naive Bayes Classifier (II)
"Likelihood "Prior"
"
Document d
represented
as features
x1..xn
Naïve Bayes Classifier (IV)
available.
Multinomial Naive Bayes
Independence Assumptions
This:
Notes:
1) Taking log doesn't change the ranking of classes!
The class with highest probability also has highest log probability!
2) It's a linear model:
Just a max of a sum of weights: a linear function of the inputs
So naive bayes is a linear classifier
Text
Classificat The Naive Bayes Classifier
ion and
Naive
Bayes
Text
Classificat
ion and Naive Bayes: Learning
Naïve
Bayes
Sec.13.3
𝑁𝑐
^ (𝑐 )=
𝑃 𝑗
𝑗
𝑁 𝑡𝑜𝑡𝑎𝑙
Parameter estimation
39
Binary multinominal naive
Bayes
Binary multinominal naive
Bayes
Binary multinominal naive
Bayes
Binary multinominal naive
Bayes
Riloff and Wiebe (2003). Learning extraction patterns for subjective expressions. EMNLP-2003.
SpamAssassin Features:
◦ Mentions millions of (dollar) ((dollar) NN,NNN,NNN.NN)
◦ From: starts with many numbers
◦ Subject is all capitals
◦ HTML has a low ratio of text to image area
◦ "One hundred percent guaranteed"
◦ Claims you can be removed from the list
Naive Bayes in Language ID
Determining what language a piece of text is written in.
Features based on character n-grams do very well
Important to train on lots of varieties of each language
(e.g., American English varieties like African-American English,
or English varieties around the world like Indian English)
Summary: Naive Bayes is Not
So Naive
Very Fast, low storage requirements
Work well with very small amounts of training data
Robust to Irrelevant Features
Irrelevant Features cancel each other without affecting results
Naïve Bayes:
Relationship to
Language Modeling
Dan Jurafsky
c=China
57
Dan Jurafsky
Naïve Bayes:
Relationship to
Language Modeling
Text
Classificat Precision, Recall, and F1
ion and
Naive
Bayes
Evaluating Classifiers: How well does our classifier
work?
Let's first address binary classifiers:
• Is this email spam?
spam (+) or not spam (-)
• Is this post about Delicious Pie Company?
about Del. Pie Co (+) or not about Del. Pie Co(-)
We can define precision and recall for multiple classes like this 3-
way email task:
How to combine P/R values for different classes:
Microaveraging vs Macroaveraging
Text
Classificat Precision, Recall, and F1
ion and
Naive
Bayes
Text
Classificat Avoiding Harms in Classification
ion and
Naive
Bayes
Harms of classification
Classifiers, like any NLP algorithm, can cause harms
This is true for any classifier, whether Naive Bayes or
other algorithms
Representational Harms
• Harms caused by a system that demeans a social group
• Such as by perpetuating negative stereotypes about them.
• Kiritchenko and Mohammad 2018 study
• Examined 200 sentiment analysis systems on pairs of sentences
• Identical except for names:
• common African American (Shaniqua) or European American (Stephanie).
• Like "I talked to Shaniqua yesterday" vs "I talked to Stephanie yesterday"
• Result: systems assigned lower sentiment and more negative
emotion to sentences with African American names
• Downstream harm:
• Perpetuates stereotypes about African Americans
• African Americans treated differently by NLP tools like sentiment (widely
Harms of Censorship
• Toxicity detection is the text classification task of detecting hate speech,
abuse, harassment, or other kinds of toxic language.
• Widely used in online content moderation
• Toxicity classifiers incorrectly flag non-toxic sentences that simply
mention minority identities (like the words "blind" or "gay")
• women (Park et al., 2018),
• disabled people (Hutchinson et al., 2020)
• gay people (Dixon et al., 2018; Oliva et al., 2021)
• Downstream harms:
• Censorship of speech by disabled people and other groups
• Speech by these groups becomes less visible online
• Writers might be nudged by these algorithms to avoid these words
Performance Disparities
1. Text classifiers perform worse on many languages
of the world due to lack of data or labels
2. Text classifiers perform worse on varieties of
even high-resource languages like English
• Example task: language identification, a first step in
NLP pipeline ("Is this post in English or not?")
• English language detection performance worse for
writers who are African American (Blodgett and
O'Connor 2017) or from India (Jurgens et al., 2017)
Harms in text classification
• Causes:
• Issues in the data; NLP systems amplify biases in training data
• Problems in the labels
• Problems in the algorithms (like what the model is trained to
optimize)
• Prevalence: The same problems occur throughout NLP
(including large language models)
• Solutions: There are no general mitigations or solutions
• But harm mitigation is an active area of research
• And there are standard benchmarks and tools that we can use for
measuring some of the harms
Text
Classificat Avoiding Harms in Classification
ion and
Naive
Bayes