03 LanguageModel
03 LanguageModel
Application
Week 3: Language Model
❏ Introduction to n-gram
❏ Estimating N-gram Probabilities
❏ N-gram model evaluation
❏ Smoothing techniques
XLNNTNUD - Language Model
INTRODUCTION TO N-GRAM
Introduction to n-gram
❏ Probabilistic Language Model: assign a probability to a
sentence
❏ Machine Translation:
❏ P(ngôn ngữ tự nhiên) > P(ngôn ngữ nhiên tự)
❏ Spelling error detection and correction:
❏ P(ngôn ngữ tự nhiên) > P(gôn ngữ tu nhiên)
❏ Text summarization
❏ Question-Answering
❏ ...
Introduction to n-gram (cont)
❏ Probability of a sentence (or a sequence of words):
❏ P(W) = P(w1, w2, w3...wn)
❏ Example:
❏ P(ngôn ngữ tự nhiên) = P(ngôn) x P(ngữ|ngôn) x P(tự|ngôn ngữ)
x P(nhiên|ngôn ngữ tự)
Introduction to n-gram (cont)
❏ Estimating the probability:
❏ P(ngôn) = count(ngôn)/N
❏ P(ngữ|ngôn) = count(ngôn ngữ)/count(ngôn)
❏ P(tự|ngôn ngữ) = count(ngôn ngữ tự)/count(ngôn ngữ)
❏ P(nhiên|ngôn ngữ tự) = count(ngôn ngữ tự nhiên)/count(ngôn ngữ tự)
❏ Comment:
❏ There are too many possibilities
❏ Not enough data for estimation
Introduction to n-gram (cont)
❏ Markov Assumption:
❏ P(nhiên|ngôn ngữ tự) ≈ P(nhiên|tự)
Or
❏ P(nhiên|ngôn ngữ tự) ≈ P(nhiên|ngữ tự)
Introduction to n-gram (cont)
❏ Markov Assumption:
Introduction to n-gram (cont)
❏ Unigram model (1-gram):
❏ Result:
Estimating n-gram probabilities (cont)
❏ Example:
Estimating n-gram probabilities (cont)
❏ Knowledge from the probability:
❏ P(english|want) = .0011
❏ P(chinese|want) = .0065
❏ P(to|want) = .66
❏ P(eat | to) = .28
❏ P(food | to) = 0
❏ P(want | spend) = 0
❏ P (i | <s>) = .25
Estimating n-gram probabilities (cont)
❏ Problem with Multiplication:
❏ Underflow
❏ Slow
❏ Transform Multiplication into Addition:
Estimating n-gram probabilities (cont)
❏ Language Modeling Toolkits:
❏ SRILM
❏ IRSTLM
❏ KendLM
❏ ...
XLNNTNUD - Language Model
MODEL EVALUATION
Model Evaluation
❏ Language Model (comparing “good" and “not good" sentence):
❏ Assign higher probability to “real" or “frequently seen" sentences than
“ungrammatical" or “rarely seen" sentences.
❏ Model’s parameters are trained on a training set
❏ The model’s performance are tested on unseen data
❏ A test set is an unseen dataset, separate from the training set
❏ An evaluation metric show how good our model does on the test set
Model Evaluation (cont)
❏ Extrinsic Evaluation: to compare models A and B
❏ Give each model a task:
❏ Spelling correction, Machine Translation…
❏ Run the task and get an accuracy for A and B
❏ How many misspelled words corrected properly
❏ How many words translated correctly
❏ Compare accuracy for A and B
Model Evaluation (cont)
❏ Extrinsic Evaluation:
❏ Time consuming (days or even weeks to complete…)
❏ Therefore, Intrinsic evaluation is sometimes used: perplexity
❏ Bad approximation:
❏ If the test data doesn’t look like the training data
❏ Only useful in pilot experiment
Model Evaluation (cont)
❏ Perplexity:
❏ How well can we predict the next word: