0% found this document useful (0 votes)
227 views

Mid-Term Project Report On Spell Checker

This document provides a mid-term project report on a spell checker. It defines the problem of spelling mistakes in writing and the need for spell checking software. It describes the main components of a spell checker, including the Peter Norvig algorithm which uses Bayes' theorem to find the most likely correction. It outlines the language model, error model, and control mechanism used. Diagrams of the data flow and use cases are included. The modules implemented so far are the web browser, spell check program, and dictionary.

Uploaded by

hansrajpatidar
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
227 views

Mid-Term Project Report On Spell Checker

This document provides a mid-term project report on a spell checker. It defines the problem of spelling mistakes in writing and the need for spell checking software. It describes the main components of a spell checker, including the Peter Norvig algorithm which uses Bayes' theorem to find the most likely correction. It outlines the language model, error model, and control mechanism used. Diagrams of the data flow and use cases are included. The modules implemented so far are the web browser, spell check program, and dictionary.

Uploaded by

hansrajpatidar
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Mid-Term

Project Report On Spell Checker

Nikhil Tulsyan 070911202


Hansraj Patidar 070911170
Jeevan Kumar 070911146
Table Of Contents
1. Problem Definition
2. Design Documents.
3. Modules Implemented till present.
4. Total Modules In the Project
5. References.
Problem Definition
English is unusual in that most words used in formal
writing have a single spelling that can be found in a
typical dictionary, with the exception of some jargon
and modified words. In many languages, however, it's
typical to frequently combine words in new ways. People
often make grammatical and spelling mistakes in
writing and therefore we need software to correct
mistakes and give available suggestions to the user .
SPELL CHECKER
•  A spell checker (or spell check) is an application program that flags words in a
document that may not be spelled correctly. Spell checkers may be stand-alone capable
of operating on a block of text, or as part of a larger application, such as a word
processor, email client, electronic dictionary, or search engine.

Incorrect Word Spell Checker Corrected Word

E.g. anique will be corrected to unique


How it works !
• Peter Norvig Algorithm :

• It is based on Bayes’ Theorem.


which states that P(A/B)=P(B/A)*P(A)/P(B)

•Given a word, we are trying to choose the most likely spelling correction for
that word (the "correction" may be the original word itself).
We will say that we are trying to find the correction c, out of all possible
corrections, that maximizes the probability of c given the original word w:
argmaxc P(c|w)

By Bayes' Theorem this is equivalent to: argmaxc P(w|c) P(c) / P(w)

Since P(w) is the same for every possible c, we can ignore it, giving:
argmaxc P(w|c) P(c)
Parts of Spell check
There are three parts of this expression. From right to left, we have:
1) P(c), the probability that a proposed correction c stands on its own. This is
called the language model: think of it as answering the question "how
likely is c to appear in an English text?" So P("the") would have a relatively
high probability, while P("zxzxzxzyyy") would be near zero.

2) P(w|c), the probability that w would be typed in a text when the author


meant c. This is the error model: think of it as answering "how likely is it that
the author would type w by mistake when c was intended?"

3) argmaxc, the control mechanism, which says to enumerate all feasible values
of c, and then choose the one that gives the best combined probability 
Language Model
• We will read a file name dictionary.txt , which consists of about
million words.
• We then extract the individual words from the file(using a function,
which converts everything to lowercase, so that "the" and "The" will
be the same). Next we train a probability model, which is a fancy
way of saying we count how many times each word occurs, using a
function
• Now, we define a list which holds a count of how many times the
word  has been seen.
• Now we will find all the possible corrections c of a given word w.
• An edit can be a deletion (remove one letter), a transposition (swap
adjacent letters), an alteration (change one letter to another) or an
insertion (add a letter).
Let see how to do it
First of all we split the words and extract all the alphabets
We will edit now :
1) Delete: Delete each character at once from the list of
alphabets and make all the possible words.
2) Transpose : Swap the two consecutive letters and make
all the possible words.
3) Replace : Replace each character by all other alphabets
and make all the possible words.
4) Insert: At each place in the word , each alphabet is
inserted one by one to make all the possible words.
Error Model
•If the entered word is found in the dictionary then the
word itself is corrected and we need not perform any
operation .

•If there is any correction in the word, we will find the


probabilities of all the corrected words and will return the
word which has the maximum probability based on the list
having the count
DIAGRAMS
 Data Flow Diagram
 Use Case Diagram
Data Flow Diagram for Spell checker
Use Case Diagram
Modules
Web browser
Spell check program
Dictionary of words
References
1) Wikipedia
2) http://norvig.com/spell-correct.html
3) http://docs.python.org/tutorial/
THANK YOU

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy