NLP CH 2
NLP CH 2
NLP CH 2
• Advanced:
• Good- Turing
• Kneser-Ney
Add-1 / Laplace Smoothing
• Add-1 smoothing
• Equation:
• Unigram: Ci + 1 / N + V
• Bigram:
• PAdd-1(wn|wn−1) = C(wn−1wn) +1 /
C(wn−1) +1V
Add-K Smoothing
• Add-K smoothing , k=0.5
• Equation:
• Unigram: Ci + k / N + kV
• Bigram:
• PAdd-k(wn|wn−1) = C(wn−1wn) +k /
C(wn−1) +kV
Example
• I live in India
• I live in
• I live
• in India
• India I live
• I in
• Pˆ(wn|wn−2wn−1) = λ1P(wn|wn−2wn−1)
• +λ2P(wn|wn−1)
• +λ3P(wn)
• 2.2
KNESER- Ney smoothing
POS Tagging
• Part of Speech Tagging - defined as the process
of assigning one of the parts of speech to the
given word.
• Transition Probabilities:
• How likely is Noun followed by modal which is
followed by a verb
Morphology
• Study of how words can be created.
• Morphene
– Steam word
– Affix (suffix: loved , prefix: reform infix: passersby)
Morphology parser
• Lexicon – information stored like which word
is stem word and affix formation