0% found this document useful (0 votes)

62 views

21cse356t Nlp Unit 2

Unit II focuses on Syntax Analysis in Natural Language Processing, covering Context-Free Grammars (CFG), parsing techniques (Top-Down and Bottom-Up), and grammar rules for English. It explains the structure of CFG, parse trees, and the role of treebanks in annotating syntactic and semantic relations. Additionally, it discusses the advantages and challenges of top-down and bottom-up parsing methods.

Uploaded by

santalol95

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views

21cse356t Nlp Unit 2

Uploaded by

santalol95

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 89

Unit II - Syntax Analysis

• Context Free Grammars

• Grammar Rules for English
• Top-Down Parsing
• Bottom-Up Parsing
• Ambiguity
• CKY Parsing
• Dependency Parsing
• Earley Parsing
• Probabilistic Context-Free Grammars
Context Free Grammars
• Grammar in NLP is a set of rules for constructing sentences in a
language used to understand and analyze the structure of sentences
in text data.

• This includes identifying parts of speech such as nouns, verbs, and

adjectives, determining the subject and predicate of a sentence, and
identifying the relationships between words and phrases.
• What is Grammar?
Grammar is defined as the rules for forming well-structured sentences. Grammar also
plays an essential role in describing the syntactic structure of well-formed programs,
like denoting the syntactical rules used for conversation in natural languages.

Definition:
A context-free grammar (CFG) is a list of rules that define the set of all well-formed
sentences in a language. Each rule has a left-hand side, which identifies a syntactic
category, and a right-hand side, which defines its alternative component parts, reading
from left to right.
Definition:
A context-free grammar consists of a set of rules or productions, each of which expresses
the ways that symbols of the language can be grouped and ordered together, and a lexicon
of words and symbols.
Context Free Grammar (CFG) - Formal Definition
• Context-free grammar G is a 4-tuple.
G = (V, T, S, P)
• These parameters are as follows;
V – Set of variables (also called as Non-terminal symbols)
T – Set of terminal symbols (lexicon)
• The symbols that refer to words in a language are called terminal symbols.
• Lexicon is a set of rules that introduce these symbols.
S – Designated start symbol (one of the non-terminals, S ∈ V)
P – Set of productions (also called as rules).
• Each rule in P is of the form A → s, where
• A is a non-terminal (variable) symbol.
• Each rule can have only one non-terminal symbol on the left hand side of the rule.
• s is a sequence of terminals and non-terminals. It is from (T U V)*, infinite set of strings.
A grammar G generates a language L.
Example 1
CFG
S → NP VP
NP → Det N | Det N PP | Pronoun | ProperNoun
VP → V NP | V NP PP | VP
P → P NP
Det → "the" | "a" | "an“
N → "cat" | "dog" | "book" | "city“
Pronoun → "he" | "she" | "it“
ProperNoun → "Alice" | "Bob“
V → "sees" | "likes" | "reads“
P → "in" | "on" | "with"
Example 1.a. Parse Tree:
Sentence: "The cat sees a dog."
• For the sentence "The cat 1.Start with the start symbol:
sees a dog", the parse tree S
is as follows: 2.Expand using the rule S → NP VP:
NP VP
3.Expand NP using NP → Det N:
Det N VP
4.Expand Det to "the" and N to "cat":
"the" "cat" VP
5.Expand VP using VP → V NP:
"the" "cat" V NP
6.Expand V to "sees" and NP to Det N:
"the" "cat" "sees" Det N
7.Expand Det to "a" and N to "dog":
"the" "cat" "sees" "a" "dog"
The derivation is complete.
• Example 1.b. : The sentence "Alice reads a book in the city", the
grammar generates the following parse tree:
Example 2:

•sample sentence “the giraffe dreams”

•The root of every subtree has a grammatical category that

appears on the left-hand side of a rule, and the children of
that root are identical to the elements on the right-hand side
of that rule.
Example 3:
Example 4:
Grammar Rules For English
In the context of English grammar, CFG provides a set of rules for generating valid sentences.
Here are some grammar rules for English with respect to CFG:
➢ Sentence Structure:
•S → NP VP: A sentence (S) consists of a noun phrase (NP) followed by a verb phrase (VP).
•S → NP VP
•Example: "The cat (NP) sleeps (VP)."
➢ Noun Phrase (NP):
•NP → (Det) (Adj*) N (PP*): A noun phrase (NP) can consist of an optional determiner (Det),
followed by zero or more adjectives (Adj*), a noun (N), and zero or more prepositional
phrases (PP*).
•NP → (Det) (Adj*) N (PP*)
• Example: "A (Det) small (Adj) black (Adj) cat (N) with green eyes (PP)."
Grammar Rules For English
➢ Verb Phrase (VP):
•VP → V (NP): A verb phrase (VP) consists of a verb (V) optionally followed by a noun phrase
(NP).
•VP → V (NP)
•Example: "She (NP) eats (V) apples (NP)."
•
➢ Prepositional Phrase (PP):
•PP → P NP: A prepositional phrase (PP) consists of a preposition (P) followed by a noun
phrase (NP).
•PP → P NP
•Example: "The book (NP) is on (P) the table (NP)."
Grammar Rules For English
➢ Conjunctions (Conj):
•Conj → and | but | or: Conjunctions join words, phrases, or clauses.
•Conj → and | but | or
•Example: "He likes apples (NP) and (Conj) she likes oranges (NP)."
•

➢ Interjections (Interj):
•Interj → wow | oh | hey: Interjections express emotion or sentiment.
•Interj → wow | oh | hey
•Example: "Wow (Interj), what a beautiful sunset (NP)!"
Grammar Rules For English
➢ Determiners (Det):
•Det → a | an | the: Determiners are words that introduce nouns and help to specify
them.
•Det → a | an | the
•Example: "An (Det) apple (N) fell from the tree (PP)."
•
➢ Adjectives (Adj):
•Adj → tall | blue | beautiful: Adjectives modify nouns and provide additional
information about them.
•Adj → tall | blue | beautiful
• Example: "The (Det) tall (Adj) building (N) is located in the city (PP)."
Treebanks
Corpus
•A corpus is a large and structured set of machine-readable texts that have been produced in
a natural communicative setting. Its plural is corpora. They can be derived in different ways like text that was
originally electronic, transcripts of spoken language and optical character recognition, etc.

Treebank
•Treebank is a corpus in which each sentence is annotated with a parse tree. It represents syntactic and semantic
relations of words in a sentence. Treebanks are created by

• Parsing texts using parsers

• Human annotations

TreeBank Corpus

✓ It may be defined as linguistically parsed text corpus that annotates syntactic or semantic sentence structure.

✓ Geoffrey Leech coined the term „treebank‟, which represents that the most common way of representing the
grammatical analysis is by means of a tree structure.

✓ Generally, Treebanks are created on the top of a corpus, which has already been annotated with part-of-speech
tags.
Example: The Penn Treebank Project:

✓ It is the most cited Treebank for the English language.

✓ It consists of over 4.5 million words of American English.

✓ Each sentence has PoS and syntactic structure.

✓ It has 36 PoS tags and 12 other tags for punctuation and symbols.
• Data in the Penn Treebank are stored in separate files for different layers of annotation.
Top Down Parsing
Top Down Parsing
• Top-down parsing is goal-directed.
• It starts at the root node of the syntax tree (representing the start symbol) and works
downward to the leaves, attempting to match the given input string using grammar
rules.

• A top-down parser starts with a list of constituents to be built.

• It rewrites the goals in the goal list by matching one against the LHS of the grammar
rules, and expanding it with the RHS, attempting to match the sentence to be derived.
If a goal can be rewritten in several ways, then there is a choice of which rule to apply
(search problem)
• Can use depth-first or breadth-first search, and goal ordering.
1.Start with the start symbol (S) of the grammar.
2.Expand non-terminals using production rules.
3.Match terminal symbols with input tokens.
4.Backtrack if a derivation fails (in some versions).
5.Repeat until the entire input is parsed.
Top Down
• A parse tree is a tree that defines how the grammar was utilized to construct the sentence. Using
the top-down approach, the parser attempts to create a parse tree from the root node S down to
the leaves.
• The procedure begins with the assumption that the input can be derived from the selected start
symbol S.
• The next step is to find the tops of all the trees that can begin with S by looking at the
grammatical rules with S on the left-hand side, which generates all the possible trees.
• Top-down, left-to-right, and backtracking are prominent search strategies that are used in this
method.
• The search begins with the root node labeled S, i.e., the starting symbol, expands the internal
nodes using the next productions with the left-hand side equal to the internal node, and continues
until leaves are part of speech (terminals).
Example 1: Top-down parsing example (Breadth-first)
• S → NP VP
• S → Aux NP VP
Det → the | a | that | this
• S → VP
• NP → Pronoun
Noun → book | flight | meal | money
• NP → Proper-Noun Verb → book | include | prefer
• NP → Det Nominal Pronoun → I | he | she | me
• Nominal → Noun Proper-Noun → Houston | NWA
• Nominal → Nominal Aux → does
Noun
Prep → from | to | on | near | through
• Nominal → Nominal PP
• VP → Verb
• VP → Verb NP Input sentence
• VP → VP PP
• PP → Prep NP
Example 1 – top down

S → NP VP
S → Aux NP VP
S → VP

NP → Pronoun
Pronoun → I | he | she | me NP → Proper-Noun
NP → Det Nominal
NP → Pronoun
NP → Proper-Noun
NP → Det Nominal
Example 1 – top down

Det → the | a | that |

NP → Pronoun this
NP → Proper-Noun
Proper-Noun → NP → Det Nominal
Houston | NWA
Example 1 – top down

S → NP VP S → NP VP
S → Aux NP VP
S → Aux NP VP S → VP
S → VP

Aux → does VP → Verb Verb → book |

VP → Verb NP include | prefer
VP → VP PP
Example 1– top down

VP → Verb
VP → Verb NP Verb → book |
VP → VP PP include | prefer
Example 1– top down
NP → Pronoun
NP → Proper-Noun
NP → Det Nominal

NP → Pronoun
NP → Proper-Noun
NP → Det Nominal Proper-Noun → Houston | NWA
Pronoun → I | he | she | me
Example 1– top down

NP → Pronoun Nominal → Noun

NP → Proper-Noun Nominal → Nominal Noun
NP → Det Nominal Noun → book | flight | meal | money
Nominal → Nominal PP
1.S→NP VP PP 2. NP→N 3..VP→V NP 4.PP→ϵ (since there’s no preposition phrase in the
sentence, PP will be null here) 5. N→"John" ∣ "game“ 6. V→"is" ∣ "playing“ 7. Det→"a“
2.Take the sentence: “John is playing a game”, and apply Top-down parsing

1. Start with S 5. Expand NP inside VP

3.Expand VP→V NP
We start with the NP→Det N
4. V → “is” 6. Det→"a".
root S (Sentence):
Rule: S→NP VP P 7. N→"game“
8 PP→ε (empty)

2. Expand NP
Expand NP→N and match N→"John“

Example 2
Example 3– top down parsing

S -> NP VP VP -> V NP | V NP PP
PP -> P NP V -> "saw" | "ate" | "walked"
NP -> "John" | "Mary" | "Bob" | Det N | Det N PP
Det -> "a" | "an" | "the" | "my"
N -> "man" | "dog" | "cat" | "telescope" | "park"
P -> "in" | "on" | "by" | "with" """)
EXAMPLE 3
S -> NP VP VP -> V NP | V NP PP
PP -> P NP V -> "saw" | "ate" | "walked"
NP -> "John" | "Mary" | "Bob" | Det N | Det N PP
Det -> "a" | "an" | "the" | "my"
N -> "man" | "dog" | "cat" | "telescope" | "park"
P -> "in" | "on" | "by" | "with" """)
EXAMPLE 3
S -> NP VP VP -> V NP | V NP PP
PP -> P NP V -> "saw" | "ate" | "walked"
NP -> "John" | "Mary" | "Bob" | Det N | Det N PP
Det -> "a" | "an" | "the" | "my"
N -> "man" | "dog" | "cat" | "telescope" | "park"
P -> "in" | "on" | "by" | "with" """)
Problems with top-down parsing
• Left recursive rules... e.g. NP → NP PP... lead to infinite recursion
• Will do badly if there are many different rules for the same LHS. Consider if
there are 600 rules for S, 599 of which start with NP, but one of which
starts with a V, and the sentence starts with a V.
• Useless work: expands things that are possible top-down but not there (no
bottom-up evidence for them).
• Top-down parsers do well if there is useful grammar-driven control: search
is directed by the grammar.
• Top-down is hopeless for rewriting parts of speech (preterminals) with
words (terminals). In practice that is always done bottom-up as lexical
lookup.
• Repeated work: anywhere there is common substructure.
Bottom-up parsing
• Bottom-up parsing begins with the words of input and attempts to create trees from the words up,
again by applying grammar rules one at a time.

• The parse is successful if it builds a tree rooted in the start symbol S that includes all of the input.
Bottom-up parsing is a type of datadriven search. It attempts to reverse the manufacturing
process and return the phrase to the start symbol S.

• It reverses the production to reduce the string of tokens to the beginning Symbol, and the string
is recognized by generating the rightmost derivation in reverse.

• The goal of reaching the starting symbol S is accomplished through a series of reductions; when
the right-hand side of some rule matches the substring of the input string, the substring is
replaced with the left-hand side of the matched production, and the process is repeated until the
starting symbol is reached.

• Bottom-up parsing can be thought of as a reduction process. Bottomup parsing is the

construction of a parse tree in postorder.
1. Start with the Tokens
Start with the individual tokens 6 Group "is" as V
as the leaves of the tree: 7. Group “playing" as V
John is playing a game Match V→"is“
2. Group John as N Match V→ “playing”
Match N→"John“ N V V NP
N is playing a game (John) (is) (playing) (a game)
(John) 8. Combine V and NP into VP
3. Group game as 𝑁 VP→ V NP
Match 𝑁→"𝑔𝑎𝑚𝑒“ N VP
N is playing a N (John) (is playing a game)
(John) (game) 9. Combine N and VP into S
4. Group "a" as Det Match S→NP VP PP
N is playing Det N Since PP→ε
(John) (a) (game) S is formed by NP+VP
5. Combine Det and N into NP
Match NP→Det N
N is playing NP
(John) (a game)

Input sentence

Noun → book | flight | meal | money

Verb → book | include | prefer

Nominal → Noun
Bottom-up parsing example

Det → the | a | that | Nominal → Nominal Noun

Nominal → Nominal Noun this (fails as that is not Nominal → Nominal PP
Nominal → Nominal PP ‘Noun’)
Bottom-up parsing example

Det → the | a | that | NP → Det Nominal

this
Bottom-up parsing example

Appending noun
to nominal

Nominal → Noun
Noun → book | flight | meal | money Nominal → Nominal Noun
Nominal → Nominal PP
Bottom-up parsing example

PP cannot be expanded
S → NP VP as no input remaining
Bottom-up parsing example

Noun → book | flight | meal | Det → the | a | that | this

VP → Verb S → VP
VP → Verb NP
Bottom-up parsing example

There is no rule for S → Det NP

VP → VP PP
Bottom-up parsing example

VP → Verb
No rule for PP-> VP → Verb NP
Det NP
Bottom-up parsing example
Top-Down Parsing Bottom-Up Parsing
It is a parsing strategy that first looks at the It is a parsing strategy that first looks at the lowest
highest level of the parse tree and level of the parse tree
works down the parse tree by using the rules of and works up the parse tree by using the rules of
grammar. grammar.
Bottom-up parsing can be defined as an attempt to
Top-down parsing attempts to find the left most
reduce the input string to the start symbol of a
derivations for an input string.
grammar.
In this parsing technique we start parsing from the
In this parsing technique we start parsing from
bottom (leaf node of
the top (start symbol of parse tree) to down (the
the parse tree) to up (the start symbol of the parse
leaf node of parse tree) in a top-down manner.
tree) in a bottom-up manner.
This parsing technique uses Left Most
This parsing technique uses Right Most Derivation.
Derivation.
The main leftmost decision is to select The main decision is to select when to use a
what production rule to use in order to production rule to reduce the string to get the
construct the string. starting symbol.
Example: Recursive Descent parser. Example: Its Shift Reduce parser.
Syntactic ambiguity:
Syntactic Ambiguity refers to ambiguity in sentence structure and be able to
interpret in different forms. This structural ambiguity occurs when the grammar
assigns more than one possible parse to a structure

Who is old?

This single string of words has two distinct meanings, which arise from two
different grammatical ways of combining the words in the sentence. This is
known as structural ambiguity or syntactic ambiguity.
Ambiguity

• .

•Consider the following

sentence:
“I shot an elephant wearing
pyjama”
•The above sentence has the
structural ambiguity as
discussed below:
➢ First, Does shoot mean taking a
photo or pointing a gun to?
➢ Second, who is wearing
pyjama? Is it the person or the
elephant?
Syntactic ambiguity:
Types of Ambiguity:
•There are three types of structural Ambiguity:

•Attachment Ambiguity: A sentence has an attachment ambiguity if a particular constituent

can be attached to the parse tree at more than one place.
•Example:
•“Guna ate an ice crème with fruits from Chennai”
•In the above sentence, we have two prepositional phrases “with fruits” and “from chennai”.
•They can be understood with following possible meanings:
✓ Guna who is from Chennai ate an ice crème filled with fruits.
✓ Guna ate an ice crème filled with fruits and the ice crème is brought from Chennai.
✓ Guna who is from Chennai ate the ice crème with the help of fruits.
• Guna with the help of fruits ate the ice crème which is bought from
Types of Ambiguity:
•Coordination Ambiguity: In this, different set of phrases can be conjoined by a conjunction
like “and”.
•Example: The phrase “old men and women can be bracketed as [old[men and women]],
referring to old men and old women, or as [old men] and [women] in which case it is only the
men who are old.
•
•Local Ambiguity: Even if a sentence is not ambiguous [ie. It does not have more than one
parse in the end], it can be inefficient to parse because of local ambiguity. Local ambiguity
occurs when some part of sentence is ambiguous, ie. It has more than one parse, even if the
whole sentence is not ambiguous.
•Example:
• “Book that flight” - this sentence is not ambiguous, but when the parser sees the first word
“BOOK”, it cannot know if the word is a verb or noun until later. That is, it must consider both
possible parses.
•Existing Solution to Syntactic Ambiguity:
✓ Previously parsers were based on deterministic grammar rules.
✓ However, now parsers are mostly based on neural networks.
•Disambiguation:-
•It is the process of determining the choosing the correct parse from the multitude of possible
parses. It is the group of techniques to handle ambiguity.
✓ Unfortunately, effective disambiguation algorithms generally require statistical, semantic
and pragmatic knowledge which are not readily available during syntactic processing.
✓ Lacking such knowledge, we are left with the choice of simply returning the entire possible
parse tree for a given input. Unfortunately, generating all the possible parses from robust,
highly ambiguous, wide-coverage grammars such as the Penn Treebank grammar is
problematic.
Limitations of CFG

1.Inability to Handle Context-Sensitivity:

1. Example: Subject-verb agreement (e.g., "The dog barks" vs. "The dogs bark").
2.Difficulty with Long-Range Dependencies:
1. Example: "The book that the author who won the prize wrote is interesting."
3.Ambiguity Resolution:
1. CFG does not provide mechanisms to resolve ambiguities effectively.
CKY Parsing
Chomsky Normal Form(CNF)
• The right-hand side of a standard CFG can have an arbitrary number
of symbols (terminals and nonterminals):
• VP → ADV eat NP
A CFG in Chomsky Normal Form (CNF) allows only two
kinds of right-hand sides: – Two nonterminals:
VP → ADV VP – One terminal: VP → eat

Any CFG can be transformed into an equivalent CNF:

VP → ADVP VP1
VP1 → VP2 NP
VP2 → eat
• Bottom-up parsing:
➢ start with the words
• Dynamic programming:
➢ save the results in a table/chart
➢ re-use these results in finding larger constituents
• Complexity: O( n3|G| )
➢ n: length of string, |G|: size of grammar)
• Presumes a CFG in Chomsky Normal Form:
➢ Rules are all either A → B C or A → a (with A,B,C
nonterminals and a a terminal)
The CKY parsing algorithm
To recover the parse tree, each
entry needs pairs of
backpointers.
Algorithm Steps
Convert the CFG to Chomsky Normal Form (CNF)
• Each production rule must be of the form:
• A→BC (two non-terminals)
• A→a
Initialize the Table
• Create a triangular table (2D matrix) of size n×nn \times nn×n, where nnn is the length of the
input string.
• Each cell T[i,j]T[i, j]T[i,j] stores the non-terminals that can generate the substring
w[i:j]w[i:j]w[i:j].
Fill the Table Bottom-Up
•First, fill the diagonal cells with non-terminals that directly generate the corresponding terminal
symbols.
•Then, for increasing substring lengths, compute which non-terminals can generate the
substrings by checking combinations of smaller substrings.
Check if the Start Symbol is in the Top Right Cell
•If the start symbol (usually S) appears in the top-right cell, the string belongs to the language
In [0,1] –the , its Det so put Det in position [0,1]
In [1,2] –flight , its Noun so put N in position [1,2]
Det(0,1) and N(1,2) forms NP
In [2,3] –includes , its Verb so put Verb in position [2,3]
In [3,4] –a , its Det so put Det in position [3,4]
In [4,5] –meal , its verb so put verb in position [4,5]
Det(3,4) and N(4,5) forms NP
V(2,3) and NP(3,5) forms VP
Dependency Parsing
Dependency Parsing
• Dependency Parsing is the process to analyze
the grammatical structure in a sentence and
find out related words as well as the type of
the relationship between them.
Each relationship:
1.Has one head and a dependent that modifies
the head.
2.Is labeled according to the nature of the
dependency between the head and
the dependent. These labels can be found
at Universal Dependency Relations.
Why Dependency Parsing?
•Identifies grammatical relationships (e.g., subject-verb, object-verb).
•Enhances understanding of sentence meaning.
•Used in machine translation, sentiment analysis, information extraction, and
question answering.
Dependency Structure
A sentence is represented as a dependency tree, where:
•Nodes represent words.
•Edges represent grammatical relations.
•Each word (except the root) depends on a single head word.
Example
•"loves" is the root (main verb).
•"John" is the subject (nsubj → nominal subject).
•"Mary" is the object (obj → direct object).
Grammatical Relationships

• The edges in a dependency tree represent grammatical relationships.

These relationships define words’ roles in a sentence, such as subject,
object, modifier, or adverbial. Here are a few common grammatical
relationships:
• A) Subject-Verb Relationship: In a sentence like “She sings,” the word
“She” depends on “sings” as the subject of the verb.
Grammatical Relationships
• Modifier-Head Relationship: In the sentence “The big cat,” “big”
modifies “cat,” creating a modifier-head relationship.
Grammatical Relationships
• C)Direct Object-Verb Relationship: In “She eats apples,” “apples” is
the direct object that depends on the verb “eats.”

Nsub : nominal node

Dobj :directobject
Grammatical Relationships
• D) Adverbial-Verb Relationship: In “He sings well,” “well” modifies
the verb “sings” and forms an adverbial-verb relationship.
Common relations
• Dependency relations define grammatical functions. Some common
ones include:

Dependency relations are often based on the Universal Dependencies (UD) framework, a widely used
annotation standard.
Dependency Parsing Methods
Dependency parsing can be broadly classified into:
a)Transition-Based Parsing
•Uses a stack-based approach.
•Parses the sentence incrementally by shifting and reducing words.
b)Graph-Based Parsing
• Views dependency parsing as a graph optimization problem.
• Finds the best dependency tree for a sentence.
c)Neural Dependency Parsing
• Uses deep learning to learn dependency relations.
• Replaces feature engineering with word embeddings (Word2Vec, BERT).
Earley Parsing
Earley Parsing
• Earley Parsing is a chart-based parsing algorithm for processing
sentences in context-free grammars (CFGs). It is particularly useful
for top-down parsing and can handle any CFG, including ambiguous
and left-recursive grammars.
• Key Features
• Can parse any CFG (unlike CYK, which requires CNF).
• Handles left recursion.
• Efficient for both parsing and recognizing sentences.

The Earley Parsing Algorithm: an efficient top-down parsing algorithm that

avoids some of the inefficiency associated with purely naive search with the
same top-down strategy (cf. recursive descent parser).
• Intermediate solutions are created only once and stored in a chart
(dynamic programming). Left-recursion problem is solved by
examining the input. Earley is not picky about what type of grammar
it accepts, i.e., it accepts arbitrary CFGs (cf. CKY).
Earley Parsing
• Data Structure: An n+1 cell array called : Chart
• For each word position, chart contains set of states representing all partial
parse trees generated to date.
• E.g. chart[0] contains all partial parse trees generated at the beginning of
the sentence
• Chart entries represent three type of constituents:
• predicted constituents (top-down predictions)
• in-progress constituents (we’re in the midst of …)
• completed constituents (we’ve found …)
• Progress in parse represented by Dotted Rules
• Position of • indicates type of constituent
• 0 Book 1 that 2 flight 3
(0,S → • VP, 0) (predicting VP)
(1,NP → Det • Nom, 2) (finding NP)
(0,VP → V NP •, 3) (found VP)
Earley Parser: Parse Success
• Final answer is found by looking at last entry in chart
• If entry resembles (0,S →  •, n) then input parsed successfully
• But … note that chart will also contain a record of all possible parses
of input string, given the grammar -- not just the successful one(s)
Earley Parsing Steps
• Start State: (0, S’ →•S, 0)
• End State: (0, S→•, n) n is the input size
• Next State Rules
• Scanner: read input
➢(i, A→•wj+1 j) → (i, A→wj+1• j+1)
• Predictor: add top-down predictions
➢(i, A→•B j) → (j, B→• j) if B→ (note B is left-most non-
terminal)
• Completer: move dot to right when new constituent found
➢(i, B→•A k) (k, A→•, j) → (i, B→A• j)
• No backtracking and no states removed: keep complete history of parse
Earley Parser Steps
Scanner Predictor Completer

When does it Applied when Applied when non- Applied when dot reaches
apply terminals are to the terminals are to the the end of a rule
right of a dot right of a dot
(1, NP → Det Nom •, 3)
(0, VP → • V NP, 0) (0, S → • VP ,0)

What chart cell is New states are added New states are added New states are added to
affected to the next cell to current cell current cell

What contents in Move the dot over the One new state for each One state for each rule
the chart cell terminal expansion of the non- “waiting” for the
terminal in the constituent such as
(0, VP → V • NP, 1)
grammar
(0, VP → V • NP, 1)
(0, VP → • V, 0)
(0, VP → V NP •, 3)
(0, VP → • V NP, 0)
Book that flight (Chart [0])
• Seed chart with top-down predictions for S from grammar

→• [0,0] Dummy start state

S → • NP VP [0,0] Predictor
S → • Aux NP VP [0,0] Predictor
S → • VP [0,0] Predictor
NP → • Det Nom [0,0] Predictor
NP → • PropN [0,0] Predictor
VP → • V [0,0] Predictor
VP → • V NP [0,0] Predictor
CFG for Fragment of English
S → NP VP Det → that | this | a
S → Aux NP VP N → book | flight | meal | money
S → VP V → book | include | prefer
NP → Det Nom Aux → does
Nom → N
Nom → N Nom Prep →from | to | on
NP →PropN PropN → Houston |
VP → V TWA→ Nom PP
Nom
VP → V NP PP → Prep NP
Chart[1]
V→ book • [0,1] Scanner
VP → V • [0,1] Completer
VP → V • NP [0,1] Completer
S → VP • [0,1] Completer
NP → • Det Nom [1,1] Predictor
NP → • PropN [1,1] Predictor

V→ book • passed to Completer, which finds 2 states in Chart[0]

whose left corner is V and adds them to Chart[1], moving dots to
right
Probabilistic Context-Free Grammars
Probabilistic Context-Free Grammars (PCFGs)
• Dealing with ambiguity: Probabilistic Context-Free Grammars (PCFGs)
• A probabilistic context-free grammar (PCFG) is a CFG where each rule
NT → β (where β is a symbol sequence) is assigned a probability
P(β|NT).
• The sum over all expansions of NT must equal 1: ∑β’ P(β’|NT) = 1.
• Easiest way to create a PCFG from a treebank: MLE
– Count all occurrences of NT → β in treebank.
– Divide by the count of all rules whose LHS is NT to get P(β|NT)
– P(NT −→ C1,C2 ...Cn|NT) = count(NT−→C1,C2...Cn) /count(NT)
• But as usual many rules have very low frequencies, so MLE isn’t good
enough and we need to smooth.
Probabilistic Context-Free Grammars (PCFGs)
Statistical disambiguation example

kids saw birds with fish

P(t2) = 1.0 · 0.1 · 0.3 · 0.7 · 1.0 · 0.18

• P(t1) = 1.0 · 0.1 · 0.7 · 1.0 · 0.4 · 0.18 · 1.0 · 1.0 · 0.18 = 0.0009072 · 1.0 · 1.0 · 0.18 = 0.0006804

which is less than P(t1) = 0.0009072, so t1 is preferred. Yay!

Probabilistic CKY
Input: POS-tagged sentence
John_N eats_V pie_N with_P cream_N

Nominative, Accusative, Dative or Genitive? - No Problem! - Jan Richter - Easy Deutsch - Anna's Archive
100% (3)
Nominative, Accusative, Dative or Genitive? - No Problem! - Jan Richter - Easy Deutsch - Anna's Archive
134 pages
NLP Final
No ratings yet
NLP Final
72 pages
FCE GOLD Plus Teacher S Book
No ratings yet
FCE GOLD Plus Teacher S Book
3 pages
3. Syntax Parsing
No ratings yet
3. Syntax Parsing
95 pages
UNIT III_NLP
No ratings yet
UNIT III_NLP
36 pages
NLP Chapter 3
No ratings yet
NLP Chapter 3
50 pages
Lecture 6
No ratings yet
Lecture 6
43 pages
Units - 2.1
No ratings yet
Units - 2.1
8 pages
NLP Module 3
No ratings yet
NLP Module 3
60 pages
c
No ratings yet
c
54 pages
Ch4-Phrase-Structure Grammars and Dependency Grammars PDF
No ratings yet
Ch4-Phrase-Structure Grammars and Dependency Grammars PDF
48 pages
NLP Module-3
No ratings yet
NLP Module-3
67 pages
Syntactic Analysis - I
No ratings yet
Syntactic Analysis - I
28 pages
Syntax Parsing: Lecture # 6
100% (1)
Syntax Parsing: Lecture # 6
65 pages
Chapter 12
No ratings yet
Chapter 12
16 pages
8-Syntax Part1 Merged
No ratings yet
8-Syntax Part1 Merged
139 pages
Unit Iii
No ratings yet
Unit Iii
17 pages
14-syntax-1
No ratings yet
14-syntax-1
22 pages
Natural Language Processing
No ratings yet
Natural Language Processing
11 pages
unit-4
No ratings yet
unit-4
45 pages
Overview of Linguistics
No ratings yet
Overview of Linguistics
19 pages
Constituency Parsing Ppt
No ratings yet
Constituency Parsing Ppt
94 pages
Unit - 2 NLP - R20
No ratings yet
Unit - 2 NLP - R20
21 pages
4.Chapter5_ Syntactic and Semantic Representations
No ratings yet
4.Chapter5_ Syntactic and Semantic Representations
47 pages
Natural Language Processing_Notes_Unit 3
No ratings yet
Natural Language Processing_Notes_Unit 3
19 pages
Context Free Grammars
No ratings yet
Context Free Grammars
38 pages
Mod - 3 (2)
No ratings yet
Mod - 3 (2)
51 pages
Chapter Five
No ratings yet
Chapter Five
96 pages
Natural Language Processing
No ratings yet
Natural Language Processing
21 pages
214 Grammar 2014
No ratings yet
214 Grammar 2014
50 pages
NLP CHAPTER 3
No ratings yet
NLP CHAPTER 3
23 pages
NLP-UNIT-II
No ratings yet
NLP-UNIT-II
30 pages
Chapter 2 Automata
0% (1)
Chapter 2 Automata
29 pages
NLP Module 3
No ratings yet
NLP Module 3
41 pages
Ai Unit 5
No ratings yet
Ai Unit 5
19 pages
Context-Free Grammars and Constituency Parsing
No ratings yet
Context-Free Grammars and Constituency Parsing
26 pages
Artificial Intelligence: Natural Language Processing II
No ratings yet
Artificial Intelligence: Natural Language Processing II
51 pages
Natural Language Processing
No ratings yet
Natural Language Processing
47 pages
NLP Unit Ii
No ratings yet
NLP Unit Ii
30 pages
Grammars and Parsing
No ratings yet
Grammars and Parsing
29 pages
Unit - 3
No ratings yet
Unit - 3
15 pages
6_Languages_Grammars
No ratings yet
6_Languages_Grammars
37 pages
Natural Language Processing Artificial Intelligence
No ratings yet
Natural Language Processing Artificial Intelligence
81 pages
04 - Parsing in NLP
No ratings yet
04 - Parsing in NLP
39 pages
Formal Grammars and Parsing
No ratings yet
Formal Grammars and Parsing
11 pages
NLP - UNIT II
No ratings yet
NLP - UNIT II
13 pages
Unit-3 Aim 502
No ratings yet
Unit-3 Aim 502
14 pages
Syntax-Semantic Asmaa El Ouafi class notes
No ratings yet
Syntax-Semantic Asmaa El Ouafi class notes
12 pages
Syntax Analysis (Part-I)
No ratings yet
Syntax Analysis (Part-I)
88 pages
Syntax Analysis
No ratings yet
Syntax Analysis
87 pages
Grammars: Before You Can Parse You Need A Grammar. So Where Do Grammars Come From?
No ratings yet
Grammars: Before You Can Parse You Need A Grammar. So Where Do Grammars Come From?
32 pages
Natural Language Processing PDF
100% (1)
Natural Language Processing PDF
47 pages
Ia-1 NLP
No ratings yet
Ia-1 NLP
7 pages
Syntax Parsing: Implementation Using Basic Grammar-Rules For English Language For Ontology Base Semantic Search Engine
No ratings yet
Syntax Parsing: Implementation Using Basic Grammar-Rules For English Language For Ontology Base Semantic Search Engine
15 pages
NL11SyntaxandContext Free Grammars
No ratings yet
NL11SyntaxandContext Free Grammars
185 pages
Syntax_JB_slides
No ratings yet
Syntax_JB_slides
90 pages
Parsing
No ratings yet
Parsing
22 pages
Ministry of Higher Education & Scientific Research University of Baghdad Department of Computer Science
No ratings yet
Ministry of Higher Education & Scientific Research University of Baghdad Department of Computer Science
7 pages
UNIT- 4 AI
No ratings yet
UNIT- 4 AI
35 pages
2015 Grammar 4 CS
No ratings yet
2015 Grammar 4 CS
19 pages
Graded Lessons in English An Elementary English Grammar Consisting of One Hundred Practical Lessons, Carefully Graded and Adapted to the Class-Room
From Everand
Graded Lessons in English An Elementary English Grammar Consisting of One Hundred Practical Lessons, Carefully Graded and Adapted to the Class-Room
Alonzo Reed
1.5/5 (4)
7 Days to Grammar Excellence: How to Master English from Beginner to Advanced
From Everand
7 Days to Grammar Excellence: How to Master English from Beginner to Advanced
Ranjot Singh Chahal
No ratings yet
DSREPORT
No ratings yet
DSREPORT
10 pages
Bridging the Day-Night Gap in Pothole Detection Using Generative Models and Deep Learning-Based Object Detection
No ratings yet
Bridging the Day-Night Gap in Pothole Detection Using Generative Models and Deep Learning-Based Object Detection
4 pages
Text Based Nlp.2
No ratings yet
Text Based Nlp.2
29 pages
REPORT 128 (1)
No ratings yet
REPORT 128 (1)
27 pages
Conditionals 2
100% (1)
Conditionals 2
3 pages
Sanjapig
No ratings yet
Sanjapig
4 pages
Verbs Tenses
No ratings yet
Verbs Tenses
11 pages
Past Tense Vs Present Perfect Test and Answers
100% (1)
Past Tense Vs Present Perfect Test and Answers
2 pages
Nowadays Celebrities Earn More Money Than Politicians
No ratings yet
Nowadays Celebrities Earn More Money Than Politicians
5 pages
Inglés - Grado 7. Periodo 2 PDF
No ratings yet
Inglés - Grado 7. Periodo 2 PDF
8 pages
Preface Acknowledgments Xii Xiv: I Pronunciation
100% (1)
Preface Acknowledgments Xii Xiv: I Pronunciation
89 pages
Nigerian Pidgin Expressions in The Written English
No ratings yet
Nigerian Pidgin Expressions in The Written English
9 pages
The Essentials of Academic Writing For Internation... - (PG 159 - 169)
No ratings yet
The Essentials of Academic Writing For Internation... - (PG 159 - 169)
11 pages
Infinitive
No ratings yet
Infinitive
2 pages
Chapter 1: Teaching Your Tongue To Speak English
No ratings yet
Chapter 1: Teaching Your Tongue To Speak English
36 pages
Sem 1
No ratings yet
Sem 1
100 pages
Unit Test 1 - Elementary
No ratings yet
Unit Test 1 - Elementary
6 pages
Material 3er Año
No ratings yet
Material 3er Año
16 pages
Inspiration WB3 Unit 1
100% (1)
Inspiration WB3 Unit 1
12 pages
Harper Songs
100% (1)
Harper Songs
15 pages
giáo án tiếng anh 10 Bright Unit Hello - Lesson
No ratings yet
giáo án tiếng anh 10 Bright Unit Hello - Lesson
3 pages
Macedonian A Course for Beginning and Intermediate Students English and Macedonian Edition Christina E. Kramer instant download
100% (1)
Macedonian A Course for Beginning and Intermediate Students English and Macedonian Edition Christina E. Kramer instant download
52 pages
Subject Verb Agreement With Activity
No ratings yet
Subject Verb Agreement With Activity
47 pages
LEVEL 4 The Reported Speech
No ratings yet
LEVEL 4 The Reported Speech
5 pages
Irregular Verb
No ratings yet
Irregular Verb
20 pages
Subordinate Clauses Differentiated Activity Sheets
No ratings yet
Subordinate Clauses Differentiated Activity Sheets
5 pages
Speech Therapy Test Descriptions
100% (3)
Speech Therapy Test Descriptions
36 pages
Idiom Examples
No ratings yet
Idiom Examples
8 pages
10le 03 PGP Phrasemodrev
No ratings yet
10le 03 PGP Phrasemodrev
2 pages
Ava Tattleman Parnes PDF
No ratings yet
Ava Tattleman Parnes PDF
60 pages
? (AC-S04) Week 04 - Task Assignment - School Life
No ratings yet
? (AC-S04) Week 04 - Task Assignment - School Life
7 pages
Linker of Purpose
No ratings yet
Linker of Purpose
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

21cse356t Nlp Unit 2

Uploaded by

21cse356t Nlp Unit 2

Uploaded by

Unit II - Syntax Analysis

• Context Free Grammars

• This includes identifying parts of speech such as nouns, verbs, and

•sample sentence “the giraffe dreams”

•The root of every subtree has a grammatical category that

• Parsing texts using parsers

✓ It is the most cited Treebank for the English language.

✓ It consists of over 4.5 million words of American English.

✓ Each sentence has PoS and syntactic structure.

• A top-down parser starts with a list of constituents to be built.

Det → the | a | that |

Aux → does VP → Verb Verb → book |

NP → Pronoun Nominal → Noun

1. Start with S 5. Expand NP inside VP

• Bottom-up parsing can be thought of as a reduction process. Bottomup parsing is the

Noun → book | flight | meal | money

Det → the | a | that | Nominal → Nominal Noun

Det → the | a | that | NP → Det Nominal

Noun → book | flight | meal | Det → the | a | that | this

There is no rule for S → Det NP

•Consider the following

•Attachment Ambiguity: A sentence has an attachment ambiguity if a particular constituent

1.Inability to Handle Context-Sensitivity:

Any CFG can be transformed into an equivalent CNF:

• The edges in a dependency tree represent grammatical relationships.

Nsub : nominal node

The Earley Parsing Algorithm: an efficient top-down parsing algorithm that

→• [0,0] Dummy start state

V→ book • passed to Completer, which finds 2 states in Chart[0]

kids saw birds with fish

P(t2) = 1.0 · 0.1 · 0.3 · 0.7 · 1.0 · 0.18

which is less than P(t1) = 0.0009072, so t1 is preferred. Yay!

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.