NLP CH 2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 59

CHAPTER – 2

Language Modeling and Part of Speech


Tagging

Subject: NLP Prepared By:


Asst. Prof. Chaitali Bhoi
Code: 3170723 CE, NIT
Language Model
• Predicting is difficult—especially about the
future

• But how about predicting something that


seems much easier, like the next few words
someone is going to say?

• Ex: hey, hi !! , How are……you?


Language Model
• In the following sections we will formalize this
intuition by introducing models that assign a
probability to each possible next word.

• The same models will also serve to assign a


probability to an entire sentence.
Language Model
• example, could predict that the following
sequence has a much higher probability of
appearing in a text

1 all of a sudden I notice three guys standing on


the sidewalk - it makes sense
2 on guys all I of notice sidewalk three a sudden
standing the
Application
• speech recognition
• spelling correction
• grammatical error correction
• machine translation
Language Model
• Models that assign probabilities of sequences
and words are called language models or LMs.

• we introduce the simplest model that assigns


probabilities to sentences and sequences of
words, the n-gram
N-gram
• An n-gram is a sequence of n words

• 2-gram(bigram) - is a two-word sequence of


words
• 3-gram(trigram) - is a three-word sequence of
words

• n-gram models to estimate the probability of the


last word of an n-gram given the previous words,
and also to assign probabilities to entire
sequences.
Conditional Probability
• We will use the P(A|B)notation to represent
the conditional probability of A given that the
event B has occurred. B is the “conditioning
event.”
Conditional Probability Example
• Suppose that of all individuals buying a certain
digital camera, 60% include an optional
memory card in their purchase, 40% include
an extra battery, and 30% include both a card
and battery Given that the selected individual
purchased an extra battery, the probability
that an optional card was also purchased is
Conditional Probability solution
• A = {memory card purchased}
• B = {battery purchased}
• P(A)=0.60
• P(B)=0.40
• P(A intersection B) = 0.30
• That is, of all those purchasing an extra
battery, 75% purchased an optional memory
card
Joint Probability
Joint Probability Example
Bigram
• <s> I am Jack </s>
• <s> Jack am I </s>
• <s> Jack I like </s>
• <s> Jack I do like </s>
• <s> Do I like Jack </s>

• Assume we use Bigram

• find Probable word


• 1)Jack ... 2)Jack I do....
• 3) Jack I am Jack ....
• 4) do I like Jack ....

• find sentence probability


• 1) I like Jack
• 2) Jack like nothing
Evaluating Language Models
• extrinsic evaluation
• intrinsic evaluation
extrinsic evaluation
• The best way to evaluate the performance of a
language model is to embed it in an
application and measure how much the
application improves. Such end-to-end
evaluation is called extrinsic evaluation
intrinsic evaluation
• An intrinsic evaluation metric is one that mea-
intrinsic evaluation sures the quality of a
model independent of any application.
Smoothing Techniques
• Add-1 / Laplace
• Add-K
• Backoff and Interpolation

• Advanced:
• Good- Turing
• Kneser-Ney
Add-1 / Laplace Smoothing
• Add-1 smoothing
• Equation:
• Unigram: Ci + 1 / N + V
• Bigram:
• PAdd-1(wn|wn−1) = C(wn−1wn) +1 /
C(wn−1) +1V
Add-K Smoothing
• Add-K smoothing , k=0.5

• Equation:
• Unigram: Ci + k / N + kV
• Bigram:
• PAdd-k(wn|wn−1) = C(wn−1wn) +k /
C(wn−1) +kV
Example
• I live in India
• I live in
• I live
• in India
• India I live
• I in

• I love to live in India


Unigram
• I == 5 + 0.5 / 16 + 0.5(4) = 0.3
• Love == 0 + 0.5 / 18 = 0.02
• to == 0.02
• live == 4+0.5/18 = 0.25
• in == 0.25
• India ==0.19
Bigram
• I love == 0+0.5 / 5+0.5(4) =0.07
• love to == 0+0.5 / 0+2=0.25
• to live == 0+0.5/ 0+2=0.25
• live in ==2+0.5/ 4+2=0.4
• in India == 2+0.5/ 4+2=0.4
Backoff and Interpolation
• Backoff

• Use n-1 gram instead of n gram if find


probability zero for n gram model.
Backoff and Interpolation
• Interpolation

• Pˆ(wn|wn−2wn−1) = λ1P(wn|wn−2wn−1)
• +λ2P(wn|wn−1)
• +λ3P(wn)

• such that the λs sum to 1


• λi = 1
• I Love to == (0.2)(0) + 0.4.(0) + 0.4(0) = 0
• love to live= 0.2*0+0.4*0+0.4*4=1.6
• to live in = 0.2*0+0.4*0.5+0.4*4=1.8
• live in india= 1.5
• 0.2* to live in india/ to live in == 0.2 *(0/0) = 0
• 0.2 * live in India / live in = 0.2 * (1/2) =
0.2*0.5 =0.1
• 0.3 * in India / in = 0.3 * (2/4) = 0.3*0.5=0.15
• 0.3 * India = 0.3*3=0.9
• 0.1* to live in india/ to live in == 0.1 *(0/0) = 0
• 0.1 * live in India / live in = 0.1 * (1/2) =
0.1*0.5 =0.05
• 0.4 * in India / in = 0.4 * (2/4) = 0.1*0.5=0.05
• 0.4 * India = 0.7*3=2.1

• 2.2
KNESER- Ney smoothing
POS Tagging
• Part of Speech Tagging - defined as the process
of assigning one of the parts of speech to the
given word.

• POS tagging is a task of labelling each word in a


sentence with its appropriate part of speech.

• parts of speech include nouns, verb, adverbs,


adjectives, pronouns, conjunction and their sub-
categories
POS Tagging

• noun --> place/ name/ organization name


• madal verb --> will, can , could, should, might,
must
• verb -> action // run,eat,speak, listen ,do
• adjective--> quality describe for noun
Rule-based
• Rule-based taggers use dictionary or lexicon
for getting possible tags for tagging each
word. If the word has more than one possible
tag, then rule-based taggers use hand-written
rules to identify the correct tag.
Stochastic
• Word frequency
• Sentence sequences
Transmission based
• Combination of rule based and stochastic.
POS Tagging Example
• Collecting Data- labeled data
• Create lookup table- tag each word with most
common POS
• Tag our sentence statement.
Hidden Markov Model (HMM)
• Need two things :
• Emission Probabilities:
• How likely Jane will be Noun, or Modal or Verb

• Transition Probabilities:
• How likely is Noun followed by modal which is
followed by a verb
Morphology
• Study of how words can be created.

Different type of words:


- Have exact meaning: pen, board, phone
- Combination of different meaningful word:
showcase(show + case), useless(use + case)
- Have no meaning: ing, s, es
Morphology parsing
• Collect morphene ( small meaningful word
unit which further not divided) from world.

• Morphene
– Steam word
– Affix (suffix: loved , prefix: reform infix: passersby)
Morphology parser
• Lexicon – information stored like which word
is stem word and affix formation

• Morphotactics – which word is suitable for


before word / after word / in between.
– It is set of rules.
– Ex: three words : Use Able Ness
– From this meaningful word is: useableness
Morphology parser
• Orthographic Rules – used to change words
• Speeling rules

• Ex: lady + s = ladys (not proper word)


• lady + s = ladies ( proper word)
Types of Morphene
• 1) Free morphene : independent word

• Two types of free morphene.


A) Lexical : pictured / visual words like: pen, book,
yellow, eyes

B) Grammatical : AND, OR, NOT


Types of Morphene
• 2) Bound – combined with free morphene and
make meaningful word.
– Ex: love + ing = loving

• Two types of Bound morphene:


• A) Inflection – if words added to free
morphene and POS tag will not be changed.
– Ex: cat(Noun) + s = cats(Noun)
Types of Morphene
B) Derivational
Class Changing – POS tag Change
Ex: danger (Noun) + ous = dangerous(ADJ.)
Class maaintain – word change not POS
Ex: law(Noun) + yers = lawyers(Noun)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy