We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 2
It turns out that Naive Bayes is a very
close relationship to language modeling.
Let's see how that is. We'll start by looking at the generative model for multinomial Naive Bayes. So imagine I have a class, let's say it's China. And we're, imagine that we were randomly generating a document about China. So we might start by saying well the first word except one is Shanghai and the second word is And, and the third word is Shung Jung and the forth word is Issue and the fifth word is Bounce and so on. We've just generated a little document, a random little document about China. So, what this generative model shows you, is that each word is a independently generated word from a class. Generated with a certain probability, we have a little set of probabilities we're keeping for each word. Let's think about that. Now in general Naive Bayes classifiers can use all sorts of features URLs, email addresses. We'll talk about that for spam detection. But if in the previous slide, we just use the word features and if you use all the words in the text, then it turns out that this generative model for Naive Bayes gives it a important similarity to language model. Naive Bayes turns out to be a kind of language model. And in particular each Naive Bayes, each class in a Naive Bayes classifier, each class is a unigram language model. I mean, the way we can think about that, is, each word in a naive bayes classifier, the likelihood term assigns a word, The probability of the word given the class. And a sentence in a naive bayes classifier, since we're multiplying together a sentence, or even, even a whole document. Since we're multiplying together the probabilities of all the words, we compute the probability of a sentence, given the class. We're just multiplying together all the words, the likelihoods of all the words in the class. So let's see how that works. Imagine that we have, the class positive. And we have our likelihoods, the likelihood of I given positive, that's P of I given positive, and we have P of love given positive, and we have P of this given positive and so on. And P of I given positive is.1 and P of love given positive is.1 and so on. Okay? So here's our naive Bayes classifier. Well, we can think of that exactly as a language model. We have a sequence of words. I generate some words: I, love, this, fun, film. And, and naive Bayes is assigning that sequence of words a set of probabilities, one for each word from the class, .1, .1, .05, and so on. So that if we multiplied all these together we can get a probability of the sentence. So in naive Bayes, each class is just a unigram language model conditioned on the class. So when we ask the question, which class assigns a higher probability to a document, it's like we're running two separate language models. So here I've shown you two separate language models, the positive class, and the negative class, and each one has separate probabilities. So here's probability of I given negative, and here's probability of I given positive. I guess people are more likely to use the word I when they don't like something. And, now, if we take a particular sentence, I love this fun film and we say what probabilities does this sequence, what proba, what's the probability of this sequence according to our first model? And what's the probability according to our second model? Each one assigns a probability for each word. That's the naive Bayes likelihoods. We can multiply them all together, and we can show that if we multiply them all together you can sort of see from inspection that the positive probabilities multiplied together are gonna be, become mimics of this one and this one are gonna be much higher than the negative probabilities. So you can think of naive Bayes. As each class is a, is a separate class condition language model. We're gonna run each language model to compute the likelihood of a test sentence and we'll just pick whichever language model has the higher probability as the class which is the more likely class. So that's the close relationship between naive based and language modeling.