Mid-Term Project Report On Spell Checker
Mid-Term Project Report On Spell Checker
•Given a word, we are trying to choose the most likely spelling correction for
that word (the "correction" may be the original word itself).
We will say that we are trying to find the correction c, out of all possible
corrections, that maximizes the probability of c given the original word w:
argmaxc P(c|w)
Since P(w) is the same for every possible c, we can ignore it, giving:
argmaxc P(w|c) P(c)
Parts of Spell check
There are three parts of this expression. From right to left, we have:
1) P(c), the probability that a proposed correction c stands on its own. This is
called the language model: think of it as answering the question "how
likely is c to appear in an English text?" So P("the") would have a relatively
high probability, while P("zxzxzxzyyy") would be near zero.
3) argmaxc, the control mechanism, which says to enumerate all feasible values
of c, and then choose the one that gives the best combined probability
Language Model
• We will read a file name dictionary.txt , which consists of about
million words.
• We then extract the individual words from the file(using a function,
which converts everything to lowercase, so that "the" and "The" will
be the same). Next we train a probability model, which is a fancy
way of saying we count how many times each word occurs, using a
function
• Now, we define a list which holds a count of how many times the
word has been seen.
• Now we will find all the possible corrections c of a given word w.
• An edit can be a deletion (remove one letter), a transposition (swap
adjacent letters), an alteration (change one letter to another) or an
insertion (add a letter).
Let see how to do it
First of all we split the words and extract all the alphabets
We will edit now :
1) Delete: Delete each character at once from the list of
alphabets and make all the possible words.
2) Transpose : Swap the two consecutive letters and make
all the possible words.
3) Replace : Replace each character by all other alphabets
and make all the possible words.
4) Insert: At each place in the word , each alphabet is
inserted one by one to make all the possible words.
Error Model
•If the entered word is found in the dictionary then the
word itself is corrected and we need not perform any
operation .