0% found this document useful (0 votes)

41 views55 pages

NLP Module 1

Uploaded by

akarshana102

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views55 pages

NLP Module 1

Uploaded by

akarshana102

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 55

Natural Language Processing

Course Code: DS311

Module-1
Module-1 Contents

➢ Introduction
➢ Human languages
➢ Models
➢ Regular Expressions
➢ Patterns
➢ Finite State Automata
➢ Inflectional Morphology
➢ Derivational Morphology
➢ Finite-State Morphological Parsing
➢ Porter Stemmer
Natural Language Processing
• What is NLP?
• Natural language processing (NLP) is a machine learning technology that gives
computers the ability to interpret, manipulate, and comprehend human language.
• Examples include machine translation, summarization, ticket classification, and spell
check
• Applications include
Introduction-Human Languages
The primary objectives of NLP are
• Enable human-machine communication.
• Improve human-human communication.
• Perform useful processing of text or speech.
• Some example tasks are

Conversational Agents Machine Translation Web-Based Question Answering

• Definition: Programs that • Purpose: Automatically translate • Definition: Goes beyond simple
converse with humans via natural documents from one language to web search to answer complete
language another. questions.
• Famous Example: HAL 9000 from • Challenges: Machine translation • Examples of Questions:
"2001: A Space Odyssey". is not fully solved; involves • "What does 'divergent' mean?"
• Components: Language input complex algorithms and • "What year was Abraham Lincoln
(automatic speech recognition, mathematical tools. born?"
natural language understanding), • Capabilities: Answers range from
language output (natural simple definitions to complex
language generation, speech inferences and synthesis of
synthesis). information.
Introduction-Human Languages
Knowledge in Speech and Language Processing
Distinctive Feature: Language processing applications use knowledge of language, setting them
apart from other data processing systems.
Example: Unix wc program - counts bytes and lines without language knowledge, but counting
words requires understanding what constitutes a word
Simple vs. Sophisticated: Basic systems like wc contrast with complex systems such as
conversational agents and machine translation, which require extensive language knowledge.

Phonetics and Phonology

Definition: Knowledge about linguistic sounds and their acoustic realization.
Application: Speech recognition and speech synthesis.
Example: HAL must recognize and produce words accurately, understanding sounds and
pronunciation.
Recognizing the sound pattern of "door".
Introduction-Human Languages
Morphology
Definition: Knowledge of the meaningful components of words..
Example:
Recognizing contractions: "I’m" for "I am" and "can’t" for "cannot".
Recognizing plurals: "doors" as the plural form of "door".
Syntax
Definition: Knowledge of the structural relationships between words.
Example: Correctly ordering words in a sentence:
Incorrect: "I’m I do, sorry that afraid Dave I’m can’t."
Correct: "I’m sorry, Dave. I’m afraid I can’t do that.“
Lexical and Compositional Semantics
Lexical Semantics: Meaning of individual words.
Example: Understanding "export" and "silk".
Compositional Semantics: Meaning derived from combining words in sentences.
Example: Understanding the phrase "by the end of the 18th century" as a temporal endpoint.
Introduction-Human Languages
Morphology
Definition: Knowledge of the meaningful components of words..
Example:
Recognizing contractions: "I’m" for "I am" and "can’t" for "cannot".
Recognizing plurals: "doors" as the plural form of "door".
Syntax
Definition: Knowledge of the structural relationships between words.
Example: Correctly ordering words in a sentence:
Incorrect: "I’m I do, sorry that afraid Dave I’m can’t."
Correct: "I’m sorry, Dave. I’m afraid I can’t do that.“
Lexical and Compositional Semantics
Lexical Semantics: Meaning of individual words.
Example: Understanding “bank" and “lie".
Compositional Semantics: Meaning derived from combining words in sentences.
Example: Understanding the phrase "by the end of the 18th century" as a temporal endpoint.
Introduction-Human Languages
Pragmatics and Dialogue Knowledge
Pragmatics: Relationship of meaning to speaker’s goals and intentions.
Example: HAL's responses to Dave:
Polite refusal: "I’m sorry, Dave. I’m afraid I can’t do that.
Direct refusal: "No, I won’t open the door.“
Discourse Knowledge
Definition: Knowledge about linguistic units larger than a single utterance.
Application: Coreference resolution.
Example:
Question: "How many states were in the United States that year?“
Requires context from previous question about the year Lincoln was born.
Introduction-Human Languages
Ambiguity
Ambiguity occurs when there are multiple alternative linguistic structures that can be built for an
input. Ambiguity is a common challenge in speech and language processing. Various models and
algorithms are used to resolve different types of ambiguities.

Example of Ambiguity Causes of Ambiguity Deeper

•Sentence: "I made her duck."
Ambiguities in
• Morphological/Syntactic Spoken Sentences
Ambiguity:
•Multiple Meanings(1.5) • Phonetic Ambiguity:
• Duck: Verb or Noun
• I cooked waterfowl for her. • "I" vs. "Eye"
• Her: Dative Pronoun or
• (1.6) I cooked waterfowl belonging to • "Maid" vs. "Made"
Possessive Pronoun
her.
• Semantic Ambiguity:
• (1.7) I created the (plaster?) duck she
owns. • Make: Create or Cook
• (1.8) I caused her to quickly lower her • Syntactic Ambiguity:
head or body. • Make can be transitive,
• (1.9) I waved my magic wand and ditransitive, or take a direct object
turned her into undifferentiated and a verb.
waterfowl.
Introduction-Human Languages
Ambiguity

Resolving Ambiguity
Part-of-Speech Tagging: Other Types of
Deciding whether "duck" is a verb Syntactic Disambiguation Ambiguity
or a noun. Example: Speech Act
Word Sense Disambiguation: Determining if "her" and "duck" Interpretation:
Deciding whether "make" means are part of the same entity (1.5, 1.8) Determining
or different entities (1.6). whether a sentence
"create" or "cook".
Method: Probabilistic Parsing is a statement or a
Text-to-Speech Synthesis Example:
question.
Pronouncing "lead" as in "lead
pipe" vs. "lead me on".
Models
Models and Algorithms
• Various types of linguistic knowledge can be captured using a small number of formal models
and algorithms.
• Main Models are State machines, rule systems, logic, probabilistic models, vector-space models.
• Formal models and algorithms are essential in speech and language processing.
• These tools help capture and resolve linguistic ambiguities.
Models
Models and Algorithms

State Machines Formal Rule Systems Logic-Based Models Probabilistic Models

• Definition: Formal • Description: Declarative • Definition: Models • Importance: Crucial for

models consisting of counterparts to state based on first order logic all types of linguistic
states, transitions, and machines. (predicate calculus), knowledge.
input representations. • Examples: Regular lambda-calculus, • Examples: Weighted
• Types: Deterministic grammars, regular feature-structures, and automata, Markov
and non-deterministic relations, context-free semantic primitives. models, hidden Markov
finite-state automata, grammars, feature- • Applications: models (HMMs).
finite-state transducers. augmented grammars. Traditionally used for • Applications: Part-of-
• Applications: Used for • Applications: Also used semantics and speech tagging, speech
knowledge of for phonology, pragmatics. recognition, dialogue
phonology, morphology, morphology, and syntax. • Recent Focus: Shift understanding, text-to-
and syntax. towards non-logical speech, machine
lexical semantics for translation.
robustness.
Models
Models and Algorithms

Methodological Tools
Machine Learning
Vector-Space Models Search Algorithms & Example
Tools
Applications

• Definition: Based on • Description: State space • Classifiers: Decision • Training and Test Sets:
linear algebra, used for search, dynamic trees, support vector Use of distinct sets for
information retrieval programming, heuristic machines, Gaussian training and evaluation.
and word meanings.. search (best-first, A* Mixture Models, • Statistical Techniques:
• Applications: Underlie search). logistic regression. Cross-validation for
many treatments of • Applications: Speech • Sequence Models: model evaluation.
word meanings and recognition, parsing, Hidden Markov models, • Evaluation Metrics:
information retrieval machine translation. maximum entropy Careful evaluation of
systems. Markov models, system performance.
conditional random
fields. • Example Applications
• Applications: Spelling • Spelling Correction
correction, part-of-
• Speech Recognition
speech tagging, named
entity recognition. • Machine Translation
Regular Expressions
Introduction
• Regular Expression is a language for specifying text search strings.
• Widely used in UNIX, Microsoft Word, and web search engines.
• An important theoretical tool throughout computer science and linguistics.
• First developed by Kleene in 1956
• RE is a formula in a special language that is used for specifying simple classes of strings.
• A string is a sequence of symbols; for the Strings purpose of most text based search techniques,
a string is any sequence of alphanumeric characters (letters, numbers, spaces, tabs, and
punctuation)
• These tools help capture and resolve linguistic ambiguities.
• Facilitates text search and manipulation in numerous applications
• Regular Expressions as Algebraic Notation for characterizing a set of strings
• Used to specify search strings and define a language formally
• Res requires a pattern to search for and a corpus of texts to search through
• Standardized syntax across various platforms.
Regular Expressions
Basic Regular Expression Patterns
• Simple Regular Expression is sequence of simple characters.
• Simple RE is /word/.
• REs are case sensitive
• lowercase /s/ is distinct from uppercase /S/ (/s/ matches a lower case s but not an uppercase S).
• We can solve this problem with the use of the square braces [ and ]
• The string of characters inside the braces specify a disjunction of characters to match.
• The regular expression /[1234567890]/ specified any single digit.
• /[ABCDEFGHIJKLMNOPQRSTUVWXYZ]/ any capital letter
Regular Expressions
Basic Regular Expression Patterns
• In these cases the brackets can be used with the dash (-) to specify any one character in a range.
• To specify what a single character cannot be ^.
• If caret ^ is the first symbol then result will be negated other it will treated as normal caret
symbol only.

• /?/, which means “the preceding character or nothing”.

• For this we use the question-mark /?/, which means “the preceding character or nothing”.
• We can think of the question-mark as meaning “zero or one instances of the previous
character”.
Regular Expressions
Basic Regular Expression Patterns
• The Kleene Star (*) Means "zero or more occurrences" of the previous character or regular
expression.
• Example: /a*/ matches "", "a", "aa", "aaa", etc.
• Usage: In the language of sheep, /baa*!/ matches "baa!", "baaa!", "baaaa!", etc.
• The Kleene Plus (+) Means "one or more occurrences" of the previous character
• Example: /a+/ matches "a", "aa", "aaa", etc.
• Usage: In the language of sheep, /baa+!/ matches "baa!", "baaa!", "baaaa!", etc.
• Wildcard Character (.) Matches any single character except a newline
• Example: beg.n matches "begin", "begun", "beg'n".
• Combining Wildcards with Kleene Star ( .*) means "any sequence of characters".
• Example: /aardvark.*aardvark/ matches lines where "aardvark" appears twice.
• Anchors in Regular Expressions Caret (^): Matches the start of a line.
• Example: ^The matches "The" at the start of a line.
Regular Expressions
Basic Regular Expression Patterns
• Dollar Sign ($): Matches the end of a line.
• Example: dog\.$ matches "dog." at the end of a line.
• Word Boundary (\b): Matches the start or end of a word
• Example: \bthe\b matches "the" but not "other".
• Non-Word Boundary (\B): Matches any position that is not a word boundary
• Example: \B99\B matches "299" but not "99 bottles".

• Example 1: Matching Prices

• Regex: /[0-9]+/ matches sequences of digits.
• Example 2: Matching Specific Patterns
• Regex: /^The dog\.$/ matches the exact phrase "The dog." at the start and end of a line.
Regular Expressions
Disjunction, Grouping, and Precedence operators
• Disjunction Operator (|) Matches either of the specified patterns.
• Example: : /cat|dog/ matches either "cat" or "dog".
• Use Case: Searching for texts about pets, particularly interested in cats and dogs
• Grouping with Parentheses () ensures that operators apply to specific parts of the pattern.
• Example: /gupp(y|ies)/ matches both "guppy" and "guppies".
• Operator Precedence Hierarchy: Highest to Lowest Precedence:
• Parentheses ()
• Counters * + ? {}
• Sequences and Anchors ^ $
• Disjunction |
• Example: /the*/ matches "theeee" but not "thethe“ so /(the)*/
• Example: Matching repeated instances.
• Incorrect Pattern: /Column [0-9]+ */ matches only a single column followed by spaces.
• Correct Pattern: /(Column [0-9]+ *)*/ matches repeated instances of "Column" followed by numbers
and spaces.
Regular Expressions
Simple Example
• Suppose we wanted to write a RE to find cases of the English article the.
• Initial Pattern: /the/
• Problem: Misses "The" and matches substrings within other words.
• Improved Pattern: /\b[tT]he\b/ matches "the" with word boundaries.
• Advanced Pattern: /(^|[^a-zA-Z])[tT]he([^a-zA-Z]|$)/ matches "the" at the beginning or
end of lines.
A complex Example
• Suppose the user might want to buy any PC 500 MHz or 32 Gb or Compaq or Mac or $999.99.
• Price Pattern: \b$[0-9]+(\.[0-9][0-9])?\b
• Processor Speed: \b[0-9]+ *(MHz|[Mm]egahertz|GHz|[Gg]igahertz)\b
• Disk Space: \b[0-9]+(\.[0-9]+)? *(Gb|[Gg]igabytes?)\b
• Operating Systems:
• /\b(Windows *(Vista|XP)?)\b/
• /\b(Mac|Macintosh|Apple|OS X)\b/
Regular Expressions
Advanced Operators
Substitution: Replace matched patterns.
Example: s/colour/color/
Memory: Refer to matched groups.
Example: s/([0-9]+)/<\1>/ adds brackets around digits.
Advanced Example: ‘s/the (.*)er they were, the \1er they will be/ ‘ensures the same word is used.
Finite-State Automata
Regular Expressions, FSAs, and Regular Languages
Regular Expressions (RE): Describe patterns for text searching.
Example: /cat|dog/ matches "cat" or "dog".
Finite-State Automata (FSA): Theoretical models that can implement REs.
Any RE can be implemented as an FSA (excluding those with memory features).
Regular Languages: Formal languages describable by REs or FSAs.
Interconnections:
• Any RE can be converted into an FSA and vice versa.
• Regular languages can be described by REs, FSAs, and regular grammars
Finite-State Automata
Using an FSA to Recognize Sheeptalk
Let’s begin with the “sheep language”
Defined by strings: baa!, baaa!, baaaa!, etc.
Regular Expression: /baa+!/
• A finite-state automaton for talking sheep.
• If the machine never gets to the final state, either because it runs out of input, or it gets some
input that doesn’t match an arc, or if it just happens to get stuck in some non-final state, we say
the machine rejects or fails Rejecting to accept an input.
Finite-State Automata
Using an FSA to Recognize Sheeptalk
• More formally, a finite automaton is defined by the following five parameters:
• Q= q0q1q2 . . .qN−1 a finite set of N states
• S a finite input alphabet of symbols
• q0 the start state
• F the set of final states, F is subset of Q
• δ(q, i) the transition function or transition matrix between states. Given a state q Є Q and an
input symbol i Є ∑, δ (q, i) returns a new state q1 Є Q. δ is thus a relation from Q× ∑ to Q;
• For the sheeptalk automaton Q = {q0,q1,q2,q3,q4}, S = {a,b, !}, F = {q4}, and δ(q, i) is defined by
the transition table above.
Finite-State Automata
The algorithm D-RECOGNIZE for “deterministic recognizer”.
Finite-State Automata
Formal Languages
• A set of strings made of symbols from a finite alphabet.
• Alphabet for sheep talk: S = {a, b, !}
• Model m: An FSA defining a language.
• Language L(m): Set of strings recognized or generated by m.
• Example for sheep talk:𝐿(𝑚)={baa!,baaa!,baaaa!,baaaaa!,…}
• L(m)={baa!,baaa!,baaaa!,baaaaa!,…}
• Utility of Automata:
• Expresses infinite sets in a closed form.
• Can define formal languages for computational models.
• Formal vs. Natural Languages:
• Formal Language: Used in models like phonology, morphology, syntax.
• Natural Language: Real languages spoken by people.
• Generative Grammar: Defines a formal language by generating all possible strings.
Finite-State Automata
Example:
Construct a formal language would model the
subset of English consisting of phrases like
ten cents, three dollars, one dollar thirty-five cents
and so on
An FSA for the words for English numbers 1–99.
FSA for the simple dollars and cents
Finite-State Automata
Non-Deterministic FSAs
• FSAs where multiple transitions are possible for a given state and input.
• Example NFSA:
• Self-loop on state 2 instead of state 3. At state 2, seeing an ‘a’ can either : Stay in state 2 Move to state
3
• NFSA Characteristics:
• Multiple paths for the same input. Decision points introduce non-determinism.
• Comparison with DFSA:
• DFSA: Deterministic behavior based on current state and input.
• NFSA: Non-deterministic behavior, choices at certain states.
• Epsilon (∈)-Transitions: Transitions that do not consume any input symbol.
• Example NFSA with ∈-Transition:
• Moves from state 3 to state 2 without consuming input.
• Can either: Follow ∈-transition to state 2, Follow ! arc to state 4
• Implications: Adds another layer of non-determinism.
• Flexibility in state transitions without input consumption
Finite-State Automata
Using an NFSA to Accept Strings
• The main challenge with NFSAs is multiple Choices, Potential to follow the wrong arc, leading
to rejection when the string should be accepted.
• Solutions to Non-Determinism:
• Backup: Mark position and state, backtrack if necessary.
• Look-ahead: Check future input to decide the path.
• Parallelism: Explore all paths simultaneously.
• Backup Approach:
• Make tentative choices, backtrack if needed.
• Remember alternatives at choice points.
• Search-State: Combination of current state and tape position..
• Transition Table Adjustments:
• Epsilon (∈)-Column: Represents transitions without consuming input.
• Lists of Nodes: For states with multiple transitions from the same input.
Finite-State Automata
ND-RECOGNIZE Algorithm for NFSAs
• function ND-RECOGNIZE(tape,machine) returns accept or reject
• agenda {(Initial state of machine, beginning of tape)}
• current-search-state NEXT(agenda)
• loop
• if ACCEPT-STATE?(current-search-state) returns true then return accept
• else
• agenda agenda [ GENERATE-NEW-STATES(current-search-state)
• if agenda is empty then
• return reject
• else
• current-search-state NEXT(agenda)
• end
Finite-State Automata
ND-RECOGNIZE Algorithm for NFSAs
• function GENERATE-NEW-STATES(current-state) returns a set of search-states
• current-node the node the current search-state is in
• index the point on the tape the current search-state is looking at
• return a list of search states from transition table as follows:
• (transition-table[current-node,], index)
• [(
• transition-table[current-node, tape[index]], index + 1)
• function ACCEPT-STATE?(search-state) returns true or false
• current-node the node search-state is in
• index the point on the tape search-state is looking at
• if index is at the end of the tape and current-node is an accept state of machine
• then
• return true
• else
• return false
Finite-State Automata
Example
Finite-State Automata
Recognition as Search: Depth-First and Breadth-First Search Strategies
• State Ordering in ND-RECOGNIZE:
• Undefined Order: Unexplored states are added to the agenda as they are created.
• NEXT Function: Returns an unexplored state from the agenda.
• Depth-First Search (DFS):Implementation:
• Agenda as a stack (LIFO).
• Characteristics: Follows new leads deeply.
• Backtracks upon hitting dead ends.
• Pitfall: Can enter infinite loops.
• Example Trace:
• Input: baaa!
• DFS explores one path deeply before backtracking.
• Breadth-First Search (BFS):
• Implementation: Agenda as a queue (FIFO).
• Characteristics: Explores all possible choices at each level.
• Expands one ply of the search tree at a time.
• Pitfall: Requires large memory for large state-spaces.
• Example Trace:
• Input: baaa!
• BFS explores all paths at each level before moving deeper.
Finite-State Automata
Example

BFS DFS
Finite-State Automata
Comparing Depth-First and Breadth-First Search:
Depth-First Search (DFS):
Pros:
Efficient use of memory.
Suitable for deep, narrow spaces.
Cons:Risk of infinite loops. May explore unfruitful paths deeply.
Breadth-First Search (BFS):
Pros:Guaranteed to find the shortest path.
Explores all paths uniformly.
Cons:High memory usage.
May not be practical for large state-spaces.
Choosing a Strategy:
Small Problems: Either DFS or BFS may be adequate.
Large Problems: Consider advanced techniques like dynamic programming or A*
Finite-State Automata
Relating Deterministic and Non-Deterministic Automata:
• For any NFSA, there exists an equivalent DFSA.
• Conversion Algorithm which Converts an NFSA to a DFSA, potentially increasing the number
of states exponentially.
• NFSAs can follow multiple paths simultaneously.
• If states 𝑞𝑎 and 𝑞𝑏 are reached by the same input, they form a new state 𝑞𝑎𝑏 .
• The new DFSA can have up to 2𝑁 states, where 𝑁 is the number of states in the original NFSA.
• Conversion Process from NFSA to DFSA
• Initial State: Start with the initial state of the NFSA.
• State Grouping: For each input, group reachable states into a new state.
• State Transition: Define transitions for these new grouped states.
• Repeat: Continue for every possible input and state group until no new states are formed.
Morphology
• Regular Expressions are used for finding both "woodchuck" and "woodchucks" using a single
search string.
• The main Challenges occur with irregular Plurals like "fox" to "foxes", "peccary" to "peccaries",
"goose" to "geese", and "fish" remaining unchanged.
• It takes two kinds of knowledge to correctly search for singulars and plurals of these forms.
• Orthographic rules tell us that English words ending in -y are pluralized by changing the -y to -
i- and adding an -es.
• Morphological rules tell us that fish has a null plural, and that the plural of goose is formed by
changing the vowel.
• Morphological Parsing: Recognizing and breaking down words into morphemes.
• Example: Parsing "foxes" into "fox" and "-es“, going into VERB-go + GERUND-ing.
• Applications:
• Web Search: Handling morphologically complex languages like Russian.
• Part-of-Speech Tagging: Crucial for accurate tagging in languages with rich morphology.
• Spell-Checking: Essential for creating large dictionaries.
• Machine Translation: Translating inflected forms accurately..
Morphology
• To solve the morphological parsing problem, why couldn’t we just store all the plural forms of
English nouns and -ing forms of English verbs in a dictionary and do parsing by lookup?
• Which is no enough, the key algorithm for morphological parsing are
• Finite-State Transducers (FSTs) are used for morphological parsing throughout speech and
language processing.
• Stemming: Stripping off word endings.
• Example: "foxes" to "fox".
• Lemmatization: Mapping words to their root forms.
• Example: "sang", "sung", "sings" to "sing".
• Tokenization: Separating words from running text.
• Challenges: Handling multi-word expressions like "New York" and contractions like "I'm“
• Minimum Edit Distance: Measuring orthographic similarity between words.
• Applications: Important for spell-checking and comparing word forms
Morphology
• To solve the morphological parsing problem, why couldn’t we just store all the plural forms of
English nouns and -ing forms of English verbs in a dictionary and do parsing by lookup?
• Which is not enough, the key algorithm for morphological parsing are
• Finite-State Transducers (FSTs) are used for morphological parsing throughout speech and
language processing.
• Stemming: Stripping off word endings.
• Example: "foxes" to "fox".
• Lemmatization: Mapping words to their root forms.
• Example: "sang", "sung", "sings" to "sing".
• Tokenization: Separating words from running text.
• Challenges: Handling multi-word expressions like "New York" and contractions like "I'm“
• Minimum Edit Distance: Measuring orthographic similarity between words.
• Applications: Important for spell-checking and comparing word forms
Morphology
• Understanding Morphemes
• Morphology is the study of the way words are built up from smaller meaning-bearing
Morpheme units, morphemes.
• A morpheme is often defined as the minimal meaning-bearing unit in a language
• Examples:
• Fox: Single morpheme.
• Cats: Two morphemes (cat + -s).
• Types of Morphemes
• Stems: Main morphemes supplying the primary meaning.
• Affixes: Add additional meanings; further divided into prefixes, suffixes, infixes, and
circumfixes.
• Examples:
• Prefix: un- in "unbuckle".
• Suffix: -s in "eats".
• Circumfix: ge-...-t in German "gesagt".
• Infix: Editor-in-chief, Editors-in-chief
Morphology
• Understanding Morphemes
• Multiple Affixes: Words with Multiple Affixes
• Examples:
• Rewrites: Prefix (re-) + Stem (write) + Suffix (-s).
• Unbelievably: Stem (believe) + Prefix (un-) + Suffixes (-able, -ly).
• Methods of Combining Morphemes
• Inflection: Combining stem with grammatical morphemes, resulting in the same word class
• Examples:
• Plural: cat + -s = cats.
• Past tense: walk + -ed = walked.
• Function: Often fills syntactic functions like agreement.
• Derivation: Combination of a word stem with a grammatical morpheme resulting in a different
word class
• Examples:
• Computerize + -ation = Computerization.
• Characteristics: Often has a meaning that is harder to predict exactly.
Morphology
• Methods of Combining Morphemes
• Compounding: Combination of multiple word stems.
• Examples:
• Doghouse: Dog + House.
• Bookshelf: Book + shelf
• Cliticization: Combination of a word stem with a clitic.
• Examples:
• I’ve: I + ’ve.
• L’opera: Le + Opera (French).
Inflectional Morphology
• Inflection: Modifying words to express different grammatical categories
• English has a relatively simple inflectional system.
• English nouns have only two kinds of inflection
• Plural Inflection:
• Regular plural suffix: -s or –es
• Examples:
• Regular Nouns: cat → cats, thrush → thrushes
• Irregular Nouns: mouse → mice, ox → oxen
• Possessive Inflection:
• Apostrophe + -s for singular and irregular plural nouns.
• Examples:
• Singular: llama → llama’sIrregular
• Plural: children → children’s
• Lone apostrophe for regular plural nouns.
• Example: llamas → llamas’
Inflectional Morphology
• English verbal inflection is more complicated than nominal inflection.
• First, English has three kinds of verbs.
• main verbs, (eat, sleep, impeach),
• modal verbs (can, will, should),
• primary verbs (be, have, do)
• we will mostly be concerned with the main and primary verbs, because it is these that have
inflectional endings.
• Of these verbs a large class Regular verb are regular, that is to say all verbs of this class have the
same endings marking the same functions. These regular verbs (e.g. walk, or inspect) have four
morphological forms, as follow
Inflectional Morphology
• The irregular verbs are those that have some more or less idiosyncratic forms of inflection.
• Irregular verbs in English often have five different forms, but can have as many as eight (e.g., the
verb be) or as few as three (e.g. cut or hit).
• The table below shows some sample irregular forms

• Selling Changes in Inflections

• Doubling Consonants:
• beg → begging, begged
• picnic → picnicking, picnicked
• Silent -e Deletion:
• merge → merging, merged
• -s Spelling Rules:
• toss → tosses
• waltz → waltzes
• try → tries
Derivational Morphology
• English derivation is complex compared to its inflectional system.
• Derivation involves combining a word stem with a grammatical morpheme.
• This process often results in a word of a different class with unpredictable meaning.
• Inflection changes grammatical function; derivation creates new words.
• Nominalization: Formation of new nouns from verbs or adjectives.

• Adjectives can also be derived from nouns and verbs. Here are examples of a few suffixes
deriving adjectives from nouns or verbs
Derivational Morphology
• Derivation in English is more complex than inflection for a number of reasons.
• Less productive compared to inflection.
• Not all suffixes can be added to every base word.
• Subtle and complex meaning differences among suffixes.
• Examples:
• -ation: can be added to many verbs but not all (e.g., *eatation, *spellation).
• Sincerity vs. sincereness: subtle meaning differences.
• Comparison with Inflection
• Inflection:
• Regular and predictable changes.
• Limited number of affixes.
• Derivation:
• Irregular and less predictable.
• Complex meaning changes.
Finite-State Morphological Parsing
• Our goal will be to take input forms like those in the first and third columns of Fig, produce
output forms like those in the second and fourth column.
• The second column contains the stem of each word as well as assorted morphological features.
These features specify additional information Feature about the stem. For example the feature
+N means that the word is a noun; +Sg means it is singular, +Pl that it is plural.
Finite-State Morphological Parsing
• Note that some of the input forms (like caught, goose, canto, or vino) will be ambiguous
between different morphological parses.
• For now, we will consider the goal of morphological parsing merely to list all possible parses.
• In order to build a morphological parser, we’ll need at least the following
• lexicon: the list of stems and affixes, together with basic information about them (whether a
stem is a Noun stem or a Verb stem, etc.)
• morphotactics: the model of morpheme ordering that explains which classes of morphemes can
follow other classes of morphemes inside a word. For example, the fact that the English plural
morpheme follows the noun rather than preceding it is a morphotactic fact.
• orthographic rules: these spelling rules are used to model the changes that occur in a word,
usually when two morphemes combine (e.g., the y to ie spelling rule discussed above that
changes city + -s to cities rather than citys).
Finite-State Morphological Parsing

Finite State Transducer

Finite State Automata
Porter Stemmer
• The Porter Stemmer is a popular algorithm used in natural language processing for stemming,
• which is the process of reducing words to their base or root form.
• Developed by Martin Porter in 1980, it is designed to remove common morphological and
inflectional endings from words in English.
• The Porter Stemmer is widely used in information retrieval and text mining applications.
• Key Concepts
• Stemming: The process of reducing words to their root form.
• For example, "running," "runner," and "ran" are all reduced to "run.“
• Suffix Removal: The algorithm removes suffixes based on a set of rules to produce the stem of
the word.
• Rules and Steps: The Porter Stemmer operates in a series of steps, each applying specific rules
for suffix removal.
Porter Stemmer
• Advantages and Limitations
• Advantages:
• Efficiency: The Porter Stemmer is fast and simple to implement.
• Commonly Used: It is widely used in many NLP applications and frameworks.
• Limitations:
• Aggressive Stemming: Sometimes the Porter Stemmer may be too aggressive, removing parts of
the word that are actually meaningful.
• Language Specific: The Porter Stemmer is specifically designed for English and may not work
well with other languages.
• The Porter Stemmer is a fundamental tool in text processing and information retrieval.
Porter Stemmer
• Example: Stemming the word "running"
• Steps in the Porter Stemmer
• Step 1a: Remove plural suffixes:
• Rule: SSES → SS
• Rule: IES → I
• Rule: SS → SS
• Rule: S → ``
• "running" does not match any rule here, so it remains "running."
• Step 1b: Remove suffix -ed or -ing if the stem has a vowel:

• Rule: (m>0) EED → EE

• Rule: (*v*) ED → ``
• Rule: (*v*) ING → ``
• "running" matches (*v*) ING → run
Porter Stemmer
• Step 1c: Turn terminal y to i if there is another vowel in the stem:

• Rule: (v) Y → I from nltk.stem import PorterStemmer

• "run" does not end in "y", so no change. # Initialize the stemmer porter = PorterStemmer()
# Example words words = ["running", "runner", "ran", "runs",
• Step 2: Map double suffixes to single ones:
"easily", "fairly"]
# Stemming each word
• Rule: ATIONAL → ATE stems = [porter.stem(word) for word in words]
print(stems)
• Rule: TIONAL → TION
• And many more...
• "run" does not match any rule here.
• Step 3: Deal with -ic-, -full, -ness etc.:
• Rule: ICATE → IC
• Rule: ATIVE → ``
• Rule: ALIZE → AL
• And more...
• "run" does not match any rule here.
Porter Stemmer
• Step 4: Remove -ant, -ence, -ment etc.:

• Rule: AL → ``
• Rule: ANCE → ``
• Rule: ENCE → ``
• And more...
• "run" does not match any rule here.
• Step 5a: Remove -e if the measure (m) > 1, or (m=1 and not *o):

• Rule: (m>1) E → ``
• Rule: (m=1 and not *o) E → ``
• "run" does not end in "e", so no change.
• Step 5b: Remove terminal l if m > 1 and *d (double letter):

• Rule: (m>1 and d and L) L → ``

• "run" does not end in "ll", so no change.
• Final Result
• The word "running" is reduced to the stem "run" using the Porter Stemmer.

Review Your Grammar and Ace Exams
From Everand
Review Your Grammar and Ace Exams
Florian Navarroza-Flores
No ratings yet
Natural Language Processing: Dr. Abdulfetah A.A
No ratings yet
Natural Language Processing: Dr. Abdulfetah A.A
25 pages
Natural Language Processing Unit 1-2
No ratings yet
Natural Language Processing Unit 1-2
18 pages
SLP Unit-I
No ratings yet
SLP Unit-I
39 pages
lect1-intro-3jan08 (1)
No ratings yet
lect1-intro-3jan08 (1)
94 pages
Natural Language Processing Dossier 20231110 141736 0000
No ratings yet
Natural Language Processing Dossier 20231110 141736 0000
114 pages
INTRONLP
No ratings yet
INTRONLP
30 pages
Natural Language Processing tools and approaches
No ratings yet
Natural Language Processing tools and approaches
106 pages
Introduction
No ratings yet
Introduction
23 pages
AI-MODULE 4
No ratings yet
AI-MODULE 4
28 pages
Chapter 7 - Communication Perceving and Acting
No ratings yet
Chapter 7 - Communication Perceving and Acting
21 pages
NLP
No ratings yet
NLP
78 pages
5.0
No ratings yet
5.0
34 pages
NLP Unit I
No ratings yet
NLP Unit I
117 pages
Natural Language Processing Slides
No ratings yet
Natural Language Processing Slides
1,027 pages
NLP Chapter 1
No ratings yet
NLP Chapter 1
41 pages
NLP Merged
100% (1)
NLP Merged
975 pages
nlp-01
No ratings yet
nlp-01
16 pages
Introduction To Natural Language Processing and NLTK
No ratings yet
Introduction To Natural Language Processing and NLTK
23 pages
1 Intro To NLP
100% (1)
1 Intro To NLP
46 pages
Introduction To NLP 2021
No ratings yet
Introduction To NLP 2021
13 pages
NLP_Conventional
No ratings yet
NLP_Conventional
27 pages
Outline
No ratings yet
Outline
34 pages
Lec1 Introduction
No ratings yet
Lec1 Introduction
30 pages
Part01 Overview
No ratings yet
Part01 Overview
31 pages
Introduction To Computational Linguistics: CS 5890 University of Colorado at Colorado Springs
No ratings yet
Introduction To Computational Linguistics: CS 5890 University of Colorado at Colorado Springs
29 pages
Lecture 1: Introduction To NLP: Understand Concepts Applications
No ratings yet
Lecture 1: Introduction To NLP: Understand Concepts Applications
32 pages
Chapter 5 - Communication Perceving and Acting
No ratings yet
Chapter 5 - Communication Perceving and Acting
20 pages
01
No ratings yet
01
60 pages
SebentaLN-parte1
No ratings yet
SebentaLN-parte1
42 pages
NLP Introduction
No ratings yet
NLP Introduction
35 pages
NLP IA1
No ratings yet
NLP IA1
7 pages
Lec 1.1.2
No ratings yet
Lec 1.1.2
44 pages
1 Introduction
No ratings yet
1 Introduction
99 pages
Natural Language Processing: Learning Is Not A Course, Its A Path From Passion To Profession
No ratings yet
Natural Language Processing: Learning Is Not A Course, Its A Path From Passion To Profession
19 pages
Unit-6 Natural Language Processing
No ratings yet
Unit-6 Natural Language Processing
7 pages
Part01 Overview
No ratings yet
Part01 Overview
31 pages
1 - Intro - To - NLP 2
No ratings yet
1 - Intro - To - NLP 2
55 pages
Natural Language Processing - Session 1 - Introduction
No ratings yet
Natural Language Processing - Session 1 - Introduction
55 pages
nayie bayes classifier 21 page
No ratings yet
nayie bayes classifier 21 page
28 pages
Natural Language Processing Notes by Prof. Suresh R. Mestry: L I L L L I
No ratings yet
Natural Language Processing Notes by Prof. Suresh R. Mestry: L I L L L I
41 pages
NLP PPT1 (1)
No ratings yet
NLP PPT1 (1)
29 pages
IntroductionToNLPAbebeZerihun
No ratings yet
IntroductionToNLPAbebeZerihun
45 pages
Natural Language Processing: Rada Mihalcea
No ratings yet
Natural Language Processing: Rada Mihalcea
26 pages
1.Chapter1 Introduction Chapter2 LanguageCharacteristics
No ratings yet
1.Chapter1 Introduction Chapter2 LanguageCharacteristics
35 pages
Natural Language Processing Notes Class 10 AI
100% (1)
Natural Language Processing Notes Class 10 AI
20 pages
Introduction
No ratings yet
Introduction
49 pages
Unit 1 Extra
No ratings yet
Unit 1 Extra
6 pages
NLP Textbook Star Edu
No ratings yet
NLP Textbook Star Edu
103 pages
1.introduction To Natural Language Processing (NLP)
100% (1)
1.introduction To Natural Language Processing (NLP)
37 pages
1-Introduction to NLP_part1
No ratings yet
1-Introduction to NLP_part1
31 pages
1 - Introducntion To NLP
No ratings yet
1 - Introducntion To NLP
43 pages
Natural Language Processing (NLP)
No ratings yet
Natural Language Processing (NLP)
63 pages
Chapter 6
100% (1)
Chapter 6
28 pages
NLP Introduction
No ratings yet
NLP Introduction
35 pages
Introduction To Natural Language Processing
No ratings yet
Introduction To Natural Language Processing
45 pages
NLP Notes Unit 1to5 final
No ratings yet
NLP Notes Unit 1to5 final
75 pages
Programming Language Concepts: Improving your Software Development Skills
From Everand
Programming Language Concepts: Improving your Software Development Skills
Oliver Wegner
No ratings yet
Coreference: Fundamentals and Applications
From Everand
Coreference: Fundamentals and Applications
Fouad Sabry
No ratings yet
Language, Linguistics, and Development Simplified
From Everand
Language, Linguistics, and Development Simplified
Narinder Mehra
No ratings yet
Qian PDF
No ratings yet
Qian PDF
8 pages
Neuro-Fuzzy Artificial Neural Networks & Fuzzy Logic-IJRASET
No ratings yet
Neuro-Fuzzy Artificial Neural Networks & Fuzzy Logic-IJRASET
10 pages
Mini_Project_Proposal_2024-25[1]
No ratings yet
Mini_Project_Proposal_2024-25[1]
5 pages
Homework 80687
No ratings yet
Homework 80687
3 pages
LMRC Assistant Manager IT CS 2018 Official Paper Part 2
No ratings yet
LMRC Assistant Manager IT CS 2018 Official Paper Part 2
8 pages
4.review of Live Forensic Analysis Techniques
No ratings yet
4.review of Live Forensic Analysis Techniques
10 pages
Artificial Intelligence Unit 1 &2 Notes
No ratings yet
Artificial Intelligence Unit 1 &2 Notes
82 pages
CST414-A
No ratings yet
CST414-A
2 pages
Cprog 2
No ratings yet
Cprog 2
13 pages
Enhanced Data Security in Cloud Using Block Chain
No ratings yet
Enhanced Data Security in Cloud Using Block Chain
121 pages
Sample Project Report
No ratings yet
Sample Project Report
53 pages
DWDM Unit 3
No ratings yet
DWDM Unit 3
54 pages
Applications of Pointers in C
No ratings yet
Applications of Pointers in C
8 pages
Topic 1 Part 2
No ratings yet
Topic 1 Part 2
22 pages
Unit-2 I-O IN JAVA - 231004 - 094749
No ratings yet
Unit-2 I-O IN JAVA - 231004 - 094749
20 pages
May_Jun_2024 (1)
No ratings yet
May_Jun_2024 (1)
2 pages
Linked List and Pointer
No ratings yet
Linked List and Pointer
21 pages
Wordpress Thesis 182
100% (3)
Wordpress Thesis 182
6 pages
Prediction of Rainfall Using Machine Learning & Neural Network
No ratings yet
Prediction of Rainfall Using Machine Learning & Neural Network
13 pages
OCPP-2.0.1_edition2_errata_2023-12
No ratings yet
OCPP-2.0.1_edition2_errata_2023-12
96 pages
Dsa Assignment 2
No ratings yet
Dsa Assignment 2
30 pages
Deadlock 1
No ratings yet
Deadlock 1
6 pages
Ch07 BST
No ratings yet
Ch07 BST
128 pages
Sem Iii Syllabus 23-24
No ratings yet
Sem Iii Syllabus 23-24
27 pages
En - User Manual - Linux & Modicia Os
No ratings yet
En - User Manual - Linux & Modicia Os
71 pages
Python OOP Exercise - Classes and Objects Exercises
No ratings yet
Python OOP Exercise - Classes and Objects Exercises
26 pages
Stereotypy W Reklamie
No ratings yet
Stereotypy W Reklamie
20 pages
Book 31 Mar 2025 (1)
No ratings yet
Book 31 Mar 2025 (1)
1 page
CD3281 Set2
No ratings yet
CD3281 Set2
3 pages
CHT397 - Ktu Qbank
No ratings yet
CHT397 - Ktu Qbank
7 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

NLP Module 1

Uploaded by

NLP Module 1

Uploaded by

Natural Language Processing

Course Code: DS311

Conversational Agents Machine Translation Web-Based Question Answering

Phonetics and Phonology

Example of Ambiguity Causes of Ambiguity Deeper

State Machines Formal Rule Systems Logic-Based Models Probabilistic Models

• Definition: Formal • Description: Declarative • Definition: Models • Importance: Crucial for

• /?/, which means “the preceding character or nothing”.

• Example 1: Matching Prices

• Selling Changes in Inflections

Finite State Transducer

• Rule: (m>0) EED → EE

• Rule: (v) Y → I from nltk.stem import PorterStemmer

• Rule: (m>1 and d and L) L → ``

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

NLP Module 1

Uploaded by

NLP Module 1

Uploaded by

Natural Language Processing

Course Code: DS311

Conversational Agents Machine Translation Web-Based Question Answering

Phonetics and Phonology

Example of Ambiguity Causes of Ambiguity Deeper

State Machines Formal Rule Systems Logic-Based Models Probabilistic Models

• Definition: Formal • Description: Declarative • Definition: Models • Importance: Crucial for

• /?/, which means “the preceding character or nothing”.

• Example 1: Matching Prices

• Selling Changes in Inflections

Finite State Transducer

• Rule: (m>0) EED → EE

• Rule: (*v*) Y → I from nltk.stem import PorterStemmer

• Rule: (m>1 and *d and *L) L → ``

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

• Rule: (v) Y → I from nltk.stem import PorterStemmer

• Rule: (m>1 and d and L) L → ``