0% found this document useful (0 votes)
12 views2 pages

05 Naive Bayes - Relationship To Language Modeling 4-35

Uploaded by

idhitappu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views2 pages

05 Naive Bayes - Relationship To Language Modeling 4-35

Uploaded by

idhitappu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 2

It turns out that Naive Bayes is a very

close relationship to language modeling.


Let's see how that is. We'll start by
looking at the generative model for
multinomial Naive Bayes. So imagine I have
a class, let's say it's China. And we're,
imagine that we were randomly generating a
document about China. So we might start by
saying well the first word except one is
Shanghai and the second word is And, and
the third word is Shung Jung and the forth
word is Issue and the fifth word is Bounce
and so on. We've just generated a little
document, a random little document about
China. So, what this generative model
shows you, is that each word is a
independently generated word from a class.
Generated with a certain probability, we
have a little set of probabilities we're
keeping for each word. Let's think about
that. Now in general Naive Bayes
classifiers can use all sorts of features
URLs, email addresses. We'll talk about
that for spam detection. But if in the
previous slide, we just use the word
features and if you use all the words in
the text, then it turns out that this
generative model for Naive Bayes gives it
a important similarity to language model.
Naive Bayes turns out to be a kind of
language model. And in particular each
Naive Bayes, each class in a Naive Bayes
classifier, each class is a unigram
language model. I mean, the way we can
think about that, is, each word in a naive
bayes classifier, the likelihood term
assigns a word, The probability of the
word given the class. And a sentence in a
naive bayes classifier, since we're
multiplying together a sentence, or even,
even a whole document. Since we're
multiplying together the probabilities of
all the words, we compute the probability
of a sentence, given the class. We're just
multiplying together all the words, the
likelihoods of all the words in the class.
So let's see how that works. Imagine that
we have, the class positive. And we have
our likelihoods, the likelihood of I given
positive, that's P of I given positive,
and we have P of love given positive, and
we have P of this given positive and so
on. And P of I given positive is.1 and P
of love given positive is.1 and so on.
Okay? So here's our naive Bayes
classifier. Well, we can think of that
exactly as a language model. We have a
sequence of words. I generate some words:
I, love, this, fun, film. And, and naive
Bayes is assigning that sequence of words
a set of probabilities, one for each word
from the class, .1, .1, .05, and so on. So
that if we multiplied all these together
we can get a probability of the sentence.
So in naive Bayes, each class is just a
unigram language model conditioned on the
class. So when we ask the question, which
class assigns a higher probability to a
document, it's like we're running two
separate language models. So here I've
shown you two separate language models,
the positive class, and the negative
class, and each one has separate
probabilities. So here's probability of I
given negative, and here's probability of
I given positive. I guess people are more
likely to use the word I when they don't
like something. And, now, if we take a
particular sentence, I love this fun film
and we say what probabilities does this
sequence, what proba, what's the
probability of this sequence according to
our first model? And what's the
probability according to our second model?
Each one assigns a probability for each
word. That's the naive Bayes likelihoods.
We can multiply them all together, and we
can show that if we multiply them all
together you can sort of see from
inspection that the positive probabilities
multiplied together are gonna be, become
mimics of this one and this one are gonna
be much higher than the negative
probabilities. So you can think of naive
Bayes. As each class is a, is a separate
class condition language model. We're
gonna run each language model to compute
the likelihood of a test sentence and
we'll just pick whichever language model
has the higher probability as the class
which is the more likely class. So that's
the close relationship between naive based
and language modeling.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy