Arabian Egl

Download as pdf
Download as pdf
You are on page 1of 2
Example 2.9 Training set: The Arabian Knights These are the fairy tales of the east The stories of the Arabian knights are translated in many languages Bi-gram model: P(the/) = 0.67. P(Arabian/the) = 0.4 “P(knights /Arabian) = 1.0 Plare/these) = 1.0 Plthe/are) = 0.5" P(fairy/the) = 0.2 Pltales/fairy) = 1.0) P(of/tales) = 1.0 Pi(the/of) = 1.0 Pleast/the) = 0.2 P(storiés/the) = 0.2 P(of/stories) = 1.0 P(are/knights) = 1.0 P(translated/are) = 0.5 P(in /tyanslated) = 1.0 P(many/in) = 1.0 Hikae P(languages/many) = 1.0 Test sentence(s):‘The Arabian knights are the fairy tales of the east. P(The/) x. P(Arabian/the) x P(Knights/Arabian) x P(are/knighis) x’P(the/are) x P(fairy/the) x Pltales/fairy) x Plof/tales) x P(the/of) x P(east/the) = 0.67 x 0.5 10x 1.0 x 0.5 x 0.2 x 1.0 x = 0.0067 ay As each probability is necessarily less than 1, multiplying s Probabilities might cause @ numerical underflow, particularly a ong Sentences, To avoid this, calculations are made in log space, ae 2 calculation corresponds to adding log of individual probabilities and taking antilog of the sum. 1,0 x 1.0 x 0.2 uage Processing and Information Retrieval . An mgray The n-gram model suffers from data sparseness Po esrabability, a that does not occur in the training data is assigned a bi-gram matrix that even a large corpus has several zero entries in : f occurrence of, This is because of the assumption that the probability oe nl words) word depends only on the preceding word (or precec! Bi endenciel which is not true in general. There are several long distance os ’ in natural language sentences, which this model fails to ae ‘ an (2003) pointed out that ‘there is rarely enough data to accurately estimate the parameters of a language model.’ A number of smoothing techniques have been developed to handle the data sparseness problem, the simplest of these being add-one smoothing. In the words of Jurafsky and Martin (2000) Smoothing in general refers to the task of re-evaluating zero- probability or low-probability n-grams and assigning them non-zero values. ‘The word ‘smoothing’ is used to denote these techniques b Feast ec tend to make distributions more uniform by cas th ‘ause they probabilities towards the average. 8 the extreme 2.3.2 Addnnaceaus -

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy