0% found this document useful (0 votes)
62 views

21cse356t Nlp Unit 2

Unit II focuses on Syntax Analysis in Natural Language Processing, covering Context-Free Grammars (CFG), parsing techniques (Top-Down and Bottom-Up), and grammar rules for English. It explains the structure of CFG, parse trees, and the role of treebanks in annotating syntactic and semantic relations. Additionally, it discusses the advantages and challenges of top-down and bottom-up parsing methods.

Uploaded by

santalol95
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views

21cse356t Nlp Unit 2

Unit II focuses on Syntax Analysis in Natural Language Processing, covering Context-Free Grammars (CFG), parsing techniques (Top-Down and Bottom-Up), and grammar rules for English. It explains the structure of CFG, parse trees, and the role of treebanks in annotating syntactic and semantic relations. Additionally, it discusses the advantages and challenges of top-down and bottom-up parsing methods.

Uploaded by

santalol95
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 89

Unit II - Syntax Analysis

• Context Free Grammars


• Grammar Rules for English
• Top-Down Parsing
• Bottom-Up Parsing
• Ambiguity
• CKY Parsing
• Dependency Parsing
• Earley Parsing
• Probabilistic Context-Free Grammars
Context Free Grammars
• Grammar in NLP is a set of rules for constructing sentences in a
language used to understand and analyze the structure of sentences
in text data.

• This includes identifying parts of speech such as nouns, verbs, and


adjectives, determining the subject and predicate of a sentence, and
identifying the relationships between words and phrases.
• What is Grammar?
Grammar is defined as the rules for forming well-structured sentences. Grammar also
plays an essential role in describing the syntactic structure of well-formed programs,
like denoting the syntactical rules used for conversation in natural languages.

Definition:
A context-free grammar (CFG) is a list of rules that define the set of all well-formed
sentences in a language. Each rule has a left-hand side, which identifies a syntactic
category, and a right-hand side, which defines its alternative component parts, reading
from left to right.
Definition:
A context-free grammar consists of a set of rules or productions, each of which expresses
the ways that symbols of the language can be grouped and ordered together, and a lexicon
of words and symbols.
Context Free Grammar (CFG) - Formal Definition
• Context-free grammar G is a 4-tuple.
G = (V, T, S, P)
• These parameters are as follows;
V – Set of variables (also called as Non-terminal symbols)
T – Set of terminal symbols (lexicon)
• The symbols that refer to words in a language are called terminal symbols.
• Lexicon is a set of rules that introduce these symbols.
S – Designated start symbol (one of the non-terminals, S ∈ V)
P – Set of productions (also called as rules).
• Each rule in P is of the form A → s, where
• A is a non-terminal (variable) symbol.
• Each rule can have only one non-terminal symbol on the left hand side of the rule.
• s is a sequence of terminals and non-terminals. It is from (T U V)*, infinite set of strings.
A grammar G generates a language L.
Example 1
CFG
S → NP VP
NP → Det N | Det N PP | Pronoun | ProperNoun
VP → V NP | V NP PP | VP
P → P NP
Det → "the" | "a" | "an“
N → "cat" | "dog" | "book" | "city“
Pronoun → "he" | "she" | "it“
ProperNoun → "Alice" | "Bob“
V → "sees" | "likes" | "reads“
P → "in" | "on" | "with"
Example 1.a. Parse Tree:
Sentence: "The cat sees a dog."
• For the sentence "The cat 1.Start with the start symbol:
sees a dog", the parse tree S
is as follows: 2.Expand using the rule S → NP VP:
NP VP
3.Expand NP using NP → Det N:
Det N VP
4.Expand Det to "the" and N to "cat":
"the" "cat" VP
5.Expand VP using VP → V NP:
"the" "cat" V NP
6.Expand V to "sees" and NP to Det N:
"the" "cat" "sees" Det N
7.Expand Det to "a" and N to "dog":
"the" "cat" "sees" "a" "dog"
The derivation is complete.
• Example 1.b. : The sentence "Alice reads a book in the city", the
grammar generates the following parse tree:
Example 2:

•sample sentence “the giraffe dreams”

•The root of every subtree has a grammatical category that


appears on the left-hand side of a rule, and the children of
that root are identical to the elements on the right-hand side
of that rule.
Example 3:
Example 4:
Grammar Rules For English
In the context of English grammar, CFG provides a set of rules for generating valid sentences.
Here are some grammar rules for English with respect to CFG:
➢ Sentence Structure:
•S → NP VP: A sentence (S) consists of a noun phrase (NP) followed by a verb phrase (VP).
•S → NP VP
•Example: "The cat (NP) sleeps (VP)."
➢ Noun Phrase (NP):
•NP → (Det) (Adj*) N (PP*): A noun phrase (NP) can consist of an optional determiner (Det),
followed by zero or more adjectives (Adj*), a noun (N), and zero or more prepositional
phrases (PP*).
•NP → (Det) (Adj*) N (PP*)
• Example: "A (Det) small (Adj) black (Adj) cat (N) with green eyes (PP)."
Grammar Rules For English
➢ Verb Phrase (VP):
•VP → V (NP): A verb phrase (VP) consists of a verb (V) optionally followed by a noun phrase
(NP).
•VP → V (NP)
•Example: "She (NP) eats (V) apples (NP)."

➢ Prepositional Phrase (PP):
•PP → P NP: A prepositional phrase (PP) consists of a preposition (P) followed by a noun
phrase (NP).
•PP → P NP
•Example: "The book (NP) is on (P) the table (NP)."
Grammar Rules For English
➢ Conjunctions (Conj):
•Conj → and | but | or: Conjunctions join words, phrases, or clauses.
•Conj → and | but | or
•Example: "He likes apples (NP) and (Conj) she likes oranges (NP)."

➢ Interjections (Interj):
•Interj → wow | oh | hey: Interjections express emotion or sentiment.
•Interj → wow | oh | hey
•Example: "Wow (Interj), what a beautiful sunset (NP)!"
Grammar Rules For English
➢ Determiners (Det):
•Det → a | an | the: Determiners are words that introduce nouns and help to specify
them.
•Det → a | an | the
•Example: "An (Det) apple (N) fell from the tree (PP)."

➢ Adjectives (Adj):
•Adj → tall | blue | beautiful: Adjectives modify nouns and provide additional
information about them.
•Adj → tall | blue | beautiful
• Example: "The (Det) tall (Adj) building (N) is located in the city (PP)."
Treebanks
Corpus
•A corpus is a large and structured set of machine-readable texts that have been produced in
a natural communicative setting. Its plural is corpora. They can be derived in different ways like text that was
originally electronic, transcripts of spoken language and optical character recognition, etc.

Treebank
•Treebank is a corpus in which each sentence is annotated with a parse tree. It represents syntactic and semantic
relations of words in a sentence. Treebanks are created by

• Parsing texts using parsers

• Human annotations

TreeBank Corpus

✓ It may be defined as linguistically parsed text corpus that annotates syntactic or semantic sentence structure.

✓ Geoffrey Leech coined the term „treebank‟, which represents that the most common way of representing the
grammatical analysis is by means of a tree structure.

✓ Generally, Treebanks are created on the top of a corpus, which has already been annotated with part-of-speech
tags.
Example: The Penn Treebank Project:

✓ It is the most cited Treebank for the English language.

✓ It consists of over 4.5 million words of American English.

✓ Each sentence has PoS and syntactic structure.

✓ It has 36 PoS tags and 12 other tags for punctuation and symbols.
• Data in the Penn Treebank are stored in separate files for different layers of annotation.
Top Down Parsing
Top Down Parsing
• Top-down parsing is goal-directed.
• It starts at the root node of the syntax tree (representing the start symbol) and works
downward to the leaves, attempting to match the given input string using grammar
rules.

• A top-down parser starts with a list of constituents to be built.


• It rewrites the goals in the goal list by matching one against the LHS of the grammar
rules, and expanding it with the RHS, attempting to match the sentence to be derived.
If a goal can be rewritten in several ways, then there is a choice of which rule to apply
(search problem)
• Can use depth-first or breadth-first search, and goal ordering.
1.Start with the start symbol (S) of the grammar.
2.Expand non-terminals using production rules.
3.Match terminal symbols with input tokens.
4.Backtrack if a derivation fails (in some versions).
5.Repeat until the entire input is parsed.
Top Down
• A parse tree is a tree that defines how the grammar was utilized to construct the sentence. Using
the top-down approach, the parser attempts to create a parse tree from the root node S down to
the leaves.
• The procedure begins with the assumption that the input can be derived from the selected start
symbol S.
• The next step is to find the tops of all the trees that can begin with S by looking at the
grammatical rules with S on the left-hand side, which generates all the possible trees.
• Top-down, left-to-right, and backtracking are prominent search strategies that are used in this
method.
• The search begins with the root node labeled S, i.e., the starting symbol, expands the internal
nodes using the next productions with the left-hand side equal to the internal node, and continues
until leaves are part of speech (terminals).
Example 1: Top-down parsing example (Breadth-first)
• S → NP VP
• S → Aux NP VP
Det → the | a | that | this
• S → VP
• NP → Pronoun
Noun → book | flight | meal | money
• NP → Proper-Noun Verb → book | include | prefer
• NP → Det Nominal Pronoun → I | he | she | me
• Nominal → Noun Proper-Noun → Houston | NWA
• Nominal → Nominal Aux → does
Noun
Prep → from | to | on | near | through
• Nominal → Nominal PP
• VP → Verb
• VP → Verb NP Input sentence
• VP → VP PP
• PP → Prep NP
Example 1 – top down

S → NP VP
S → Aux NP VP
S → VP

NP → Pronoun
Pronoun → I | he | she | me NP → Proper-Noun
NP → Det Nominal
NP → Pronoun
NP → Proper-Noun
NP → Det Nominal
Example 1 – top down

Det → the | a | that |


NP → Pronoun this
NP → Proper-Noun
Proper-Noun → NP → Det Nominal
Houston | NWA
Example 1 – top down

S → NP VP S → NP VP
S → Aux NP VP
S → Aux NP VP S → VP
S → VP

Aux → does VP → Verb Verb → book |


VP → Verb NP include | prefer
VP → VP PP
Example 1– top down

VP → Verb
VP → Verb NP Verb → book |
VP → VP PP include | prefer
Example 1– top down
NP → Pronoun
NP → Proper-Noun
NP → Det Nominal

NP → Pronoun
NP → Proper-Noun
NP → Det Nominal Proper-Noun → Houston | NWA
Pronoun → I | he | she | me
Example 1– top down

NP → Pronoun Nominal → Noun


NP → Proper-Noun Nominal → Nominal Noun
NP → Det Nominal Noun → book | flight | meal | money
Nominal → Nominal PP
1.S→NP VP PP 2. NP→N 3..VP→V NP 4.PP→ϵ (since there’s no preposition phrase in the
sentence, PP will be null here) 5. N→"John" ∣ "game“ 6. V→"is" ∣ "playing“ 7. Det→"a“
2.Take the sentence: “John is playing a game”, and apply Top-down parsing

1. Start with S 5. Expand NP inside VP


3.Expand VP→V NP
We start with the NP→Det N
4. V → “is” 6. Det→"a".
root S (Sentence):
Rule: S→NP VP P 7. N→"game“
8 PP→ε (empty)

2. Expand NP
Expand NP→N and match N→"John“

Example 2
Example 3– top down parsing

S -> NP VP VP -> V NP | V NP PP
PP -> P NP V -> "saw" | "ate" | "walked"
NP -> "John" | "Mary" | "Bob" | Det N | Det N PP
Det -> "a" | "an" | "the" | "my"
N -> "man" | "dog" | "cat" | "telescope" | "park"
P -> "in" | "on" | "by" | "with" """)
EXAMPLE 3
S -> NP VP VP -> V NP | V NP PP
PP -> P NP V -> "saw" | "ate" | "walked"
NP -> "John" | "Mary" | "Bob" | Det N | Det N PP
Det -> "a" | "an" | "the" | "my"
N -> "man" | "dog" | "cat" | "telescope" | "park"
P -> "in" | "on" | "by" | "with" """)
EXAMPLE 3
S -> NP VP VP -> V NP | V NP PP
PP -> P NP V -> "saw" | "ate" | "walked"
NP -> "John" | "Mary" | "Bob" | Det N | Det N PP
Det -> "a" | "an" | "the" | "my"
N -> "man" | "dog" | "cat" | "telescope" | "park"
P -> "in" | "on" | "by" | "with" """)
Problems with top-down parsing
• Left recursive rules... e.g. NP → NP PP... lead to infinite recursion
• Will do badly if there are many different rules for the same LHS. Consider if
there are 600 rules for S, 599 of which start with NP, but one of which
starts with a V, and the sentence starts with a V.
• Useless work: expands things that are possible top-down but not there (no
bottom-up evidence for them).
• Top-down parsers do well if there is useful grammar-driven control: search
is directed by the grammar.
• Top-down is hopeless for rewriting parts of speech (preterminals) with
words (terminals). In practice that is always done bottom-up as lexical
lookup.
• Repeated work: anywhere there is common substructure.
Bottom-up parsing
• Bottom-up parsing begins with the words of input and attempts to create trees from the words up,
again by applying grammar rules one at a time.

• The parse is successful if it builds a tree rooted in the start symbol S that includes all of the input.
Bottom-up parsing is a type of datadriven search. It attempts to reverse the manufacturing
process and return the phrase to the start symbol S.

• It reverses the production to reduce the string of tokens to the beginning Symbol, and the string
is recognized by generating the rightmost derivation in reverse.

• The goal of reaching the starting symbol S is accomplished through a series of reductions; when
the right-hand side of some rule matches the substring of the input string, the substring is
replaced with the left-hand side of the matched production, and the process is repeated until the
starting symbol is reached.

• Bottom-up parsing can be thought of as a reduction process. Bottomup parsing is the


construction of a parse tree in postorder.
1. Start with the Tokens
Start with the individual tokens 6 Group "is" as V
as the leaves of the tree: 7. Group “playing" as V
John is playing a game Match V→"is“
2. Group John as N Match V→ “playing”
Match N→"John“ N V V NP
N is playing a game (John) (is) (playing) (a game)
(John) 8. Combine V and NP into VP
3. Group game as 𝑁 VP→ V NP
Match 𝑁→"𝑔𝑎𝑚𝑒“ N VP
N is playing a N (John) (is playing a game)
(John) (game) 9. Combine N and VP into S
4. Group "a" as Det Match S→NP VP PP
N is playing Det N Since PP→ε
(John) (a) (game) S is formed by NP+VP
5. Combine Det and N into NP
Match NP→Det N
N is playing NP
(John) (a game)

2.
Bottom-up parsing example
Example 1: Bottom-up parsing example
Det → the | a | that | this
(Breadth-first)
• S → NP VP
• S → Aux NP VP Noun → book | flight | meal | money
• S → VP Verb → book | include | prefer
• NP → Pronoun Pronoun → I | he | she | me
• NP → Proper-Noun Proper-Noun → Houston | NWA
Aux → does
• NP → Det Nominal
Prep → from | to | on | near | through
• Nominal → Noun
• Nominal → Nominal
Noun
• Nominal → Nominal PP
• VP → Verb
• VP → Verb NP
• VP → VP PP
• PP → Prep NP
Bottom-up parsing example

Input sentence

Noun → book | flight | meal | money


Verb → book | include | prefer

Nominal → Noun
Bottom-up parsing example

Det → the | a | that | Nominal → Nominal Noun


Nominal → Nominal Noun this (fails as that is not Nominal → Nominal PP
Nominal → Nominal PP ‘Noun’)
Bottom-up parsing example

Det → the | a | that | NP → Det Nominal


this
Bottom-up parsing example

Appending noun
to nominal

Nominal → Noun
Noun → book | flight | meal | money Nominal → Nominal Noun
Nominal → Nominal PP
Bottom-up parsing example

PP cannot be expanded
S → NP VP as no input remaining
Bottom-up parsing example

Noun → book | flight | meal | Det → the | a | that | this


Nominal cannot be Noun → book | flight | meal | money
money
cannot be joined NP as Verb → book | include | prefer Nominal → Noun
no production states NP → Det Nominal
Bottom-up parsing example

VP → Verb S → VP
VP → Verb NP
Bottom-up parsing example

There is no rule for S → Det NP


VP → VP PP
Bottom-up parsing example

VP → Verb
No rule for PP-> VP → Verb NP
Det NP
Bottom-up parsing example
Top-Down Parsing Bottom-Up Parsing
It is a parsing strategy that first looks at the It is a parsing strategy that first looks at the lowest
highest level of the parse tree and level of the parse tree
works down the parse tree by using the rules of and works up the parse tree by using the rules of
grammar. grammar.
Bottom-up parsing can be defined as an attempt to
Top-down parsing attempts to find the left most
reduce the input string to the start symbol of a
derivations for an input string.
grammar.
In this parsing technique we start parsing from the
In this parsing technique we start parsing from
bottom (leaf node of
the top (start symbol of parse tree) to down (the
the parse tree) to up (the start symbol of the parse
leaf node of parse tree) in a top-down manner.
tree) in a bottom-up manner.
This parsing technique uses Left Most
This parsing technique uses Right Most Derivation.
Derivation.
The main leftmost decision is to select The main decision is to select when to use a
what production rule to use in order to production rule to reduce the string to get the
construct the string. starting symbol.
Example: Recursive Descent parser. Example: Its Shift Reduce parser.
Syntactic ambiguity:
Syntactic Ambiguity refers to ambiguity in sentence structure and be able to
interpret in different forms. This structural ambiguity occurs when the grammar
assigns more than one possible parse to a structure

Who is old?

This single string of words has two distinct meanings, which arise from two
different grammatical ways of combining the words in the sentence. This is
known as structural ambiguity or syntactic ambiguity.
Ambiguity

• .

•Consider the following


sentence:
“I shot an elephant wearing
pyjama”
•The above sentence has the
structural ambiguity as
discussed below:
➢ First, Does shoot mean taking a
photo or pointing a gun to?
➢ Second, who is wearing
pyjama? Is it the person or the
elephant?
Syntactic ambiguity:
Types of Ambiguity:
•There are three types of structural Ambiguity:

•Attachment Ambiguity: A sentence has an attachment ambiguity if a particular constituent


can be attached to the parse tree at more than one place.
•Example:
•“Guna ate an ice crème with fruits from Chennai”
•In the above sentence, we have two prepositional phrases “with fruits” and “from chennai”.
•They can be understood with following possible meanings:
✓ Guna who is from Chennai ate an ice crème filled with fruits.
✓ Guna ate an ice crème filled with fruits and the ice crème is brought from Chennai.
✓ Guna who is from Chennai ate the ice crème with the help of fruits.
• Guna with the help of fruits ate the ice crème which is bought from
Types of Ambiguity:
•Coordination Ambiguity: In this, different set of phrases can be conjoined by a conjunction
like “and”.
•Example: The phrase “old men and women can be bracketed as [old[men and women]],
referring to old men and old women, or as [old men] and [women] in which case it is only the
men who are old.

•Local Ambiguity: Even if a sentence is not ambiguous [ie. It does not have more than one
parse in the end], it can be inefficient to parse because of local ambiguity. Local ambiguity
occurs when some part of sentence is ambiguous, ie. It has more than one parse, even if the
whole sentence is not ambiguous.
•Example:
• “Book that flight” - this sentence is not ambiguous, but when the parser sees the first word
“BOOK”, it cannot know if the word is a verb or noun until later. That is, it must consider both
possible parses.
•Existing Solution to Syntactic Ambiguity:
✓ Previously parsers were based on deterministic grammar rules.
✓ However, now parsers are mostly based on neural networks.
•Disambiguation:-
•It is the process of determining the choosing the correct parse from the multitude of possible
parses. It is the group of techniques to handle ambiguity.
✓ Unfortunately, effective disambiguation algorithms generally require statistical, semantic
and pragmatic knowledge which are not readily available during syntactic processing.
✓ Lacking such knowledge, we are left with the choice of simply returning the entire possible
parse tree for a given input. Unfortunately, generating all the possible parses from robust,
highly ambiguous, wide-coverage grammars such as the Penn Treebank grammar is
problematic.
Limitations of CFG

1.Inability to Handle Context-Sensitivity:


1. Example: Subject-verb agreement (e.g., "The dog barks" vs. "The dogs bark").
2.Difficulty with Long-Range Dependencies:
1. Example: "The book that the author who won the prize wrote is interesting."
3.Ambiguity Resolution:
1. CFG does not provide mechanisms to resolve ambiguities effectively.
CKY Parsing
Chomsky Normal Form(CNF)
• The right-hand side of a standard CFG can have an arbitrary number
of symbols (terminals and nonterminals):
• VP → ADV eat NP
A CFG in Chomsky Normal Form (CNF) allows only two
kinds of right-hand sides: – Two nonterminals:
VP → ADV VP – One terminal: VP → eat

Any CFG can be transformed into an equivalent CNF:


VP → ADVP VP1
VP1 → VP2 NP
VP2 → eat
• Bottom-up parsing:
➢ start with the words
• Dynamic programming:
➢ save the results in a table/chart
➢ re-use these results in finding larger constituents
• Complexity: O( n3|G| )
➢ n: length of string, |G|: size of grammar)
• Presumes a CFG in Chomsky Normal Form:
➢ Rules are all either A → B C or A → a (with A,B,C
nonterminals and a a terminal)
The CKY parsing algorithm
To recover the parse tree, each
entry needs pairs of
backpointers.
Algorithm Steps
Convert the CFG to Chomsky Normal Form (CNF)
• Each production rule must be of the form:
• A→BC (two non-terminals)
• A→a
Initialize the Table
• Create a triangular table (2D matrix) of size n×nn \times nn×n, where nnn is the length of the
input string.
• Each cell T[i,j]T[i, j]T[i,j] stores the non-terminals that can generate the substring
w[i:j]w[i:j]w[i:j].
Fill the Table Bottom-Up
•First, fill the diagonal cells with non-terminals that directly generate the corresponding terminal
symbols.
•Then, for increasing substring lengths, compute which non-terminals can generate the
substrings by checking combinations of smaller substrings.
Check if the Start Symbol is in the Top Right Cell
•If the start symbol (usually S) appears in the top-right cell, the string belongs to the language
In [0,1] –the , its Det so put Det in position [0,1]
In [1,2] –flight , its Noun so put N in position [1,2]
Det(0,1) and N(1,2) forms NP
In [2,3] –includes , its Verb so put Verb in position [2,3]
In [3,4] –a , its Det so put Det in position [3,4]
In [4,5] –meal , its verb so put verb in position [4,5]
Det(3,4) and N(4,5) forms NP
V(2,3) and NP(3,5) forms VP
Dependency Parsing
Dependency Parsing
• Dependency Parsing is the process to analyze
the grammatical structure in a sentence and
find out related words as well as the type of
the relationship between them.
Each relationship:
1.Has one head and a dependent that modifies
the head.
2.Is labeled according to the nature of the
dependency between the head and
the dependent. These labels can be found
at Universal Dependency Relations.
Why Dependency Parsing?
•Identifies grammatical relationships (e.g., subject-verb, object-verb).
•Enhances understanding of sentence meaning.
•Used in machine translation, sentiment analysis, information extraction, and
question answering.
Dependency Structure
A sentence is represented as a dependency tree, where:
•Nodes represent words.
•Edges represent grammatical relations.
•Each word (except the root) depends on a single head word.
Example
•"loves" is the root (main verb).
•"John" is the subject (nsubj → nominal subject).
•"Mary" is the object (obj → direct object).
Grammatical Relationships

• The edges in a dependency tree represent grammatical relationships.


These relationships define words’ roles in a sentence, such as subject,
object, modifier, or adverbial. Here are a few common grammatical
relationships:
• A) Subject-Verb Relationship: In a sentence like “She sings,” the word
“She” depends on “sings” as the subject of the verb.
Grammatical Relationships
• Modifier-Head Relationship: In the sentence “The big cat,” “big”
modifies “cat,” creating a modifier-head relationship.
Grammatical Relationships
• C)Direct Object-Verb Relationship: In “She eats apples,” “apples” is
the direct object that depends on the verb “eats.”

Nsub : nominal node


Dobj :directobject
Grammatical Relationships
• D) Adverbial-Verb Relationship: In “He sings well,” “well” modifies
the verb “sings” and forms an adverbial-verb relationship.
Common relations
• Dependency relations define grammatical functions. Some common
ones include:

Dependency relations are often based on the Universal Dependencies (UD) framework, a widely used
annotation standard.
Dependency Parsing Methods
Dependency parsing can be broadly classified into:
a)Transition-Based Parsing
•Uses a stack-based approach.
•Parses the sentence incrementally by shifting and reducing words.
b)Graph-Based Parsing
• Views dependency parsing as a graph optimization problem.
• Finds the best dependency tree for a sentence.
c)Neural Dependency Parsing
• Uses deep learning to learn dependency relations.
• Replaces feature engineering with word embeddings (Word2Vec, BERT).
Earley Parsing
Earley Parsing
• Earley Parsing is a chart-based parsing algorithm for processing
sentences in context-free grammars (CFGs). It is particularly useful
for top-down parsing and can handle any CFG, including ambiguous
and left-recursive grammars.
• Key Features
• Can parse any CFG (unlike CYK, which requires CNF).
• Handles left recursion.
• Efficient for both parsing and recognizing sentences.

The Earley Parsing Algorithm: an efficient top-down parsing algorithm that


avoids some of the inefficiency associated with purely naive search with the
same top-down strategy (cf. recursive descent parser).
• Intermediate solutions are created only once and stored in a chart
(dynamic programming). Left-recursion problem is solved by
examining the input. Earley is not picky about what type of grammar
it accepts, i.e., it accepts arbitrary CFGs (cf. CKY).
Earley Parsing
• Data Structure: An n+1 cell array called : Chart
• For each word position, chart contains set of states representing all partial
parse trees generated to date.
• E.g. chart[0] contains all partial parse trees generated at the beginning of
the sentence
• Chart entries represent three type of constituents:
• predicted constituents (top-down predictions)
• in-progress constituents (we’re in the midst of …)
• completed constituents (we’ve found …)
• Progress in parse represented by Dotted Rules
• Position of • indicates type of constituent
• 0 Book 1 that 2 flight 3
(0,S → • VP, 0) (predicting VP)
(1,NP → Det • Nom, 2) (finding NP)
(0,VP → V NP •, 3) (found VP)
Earley Parser: Parse Success
• Final answer is found by looking at last entry in chart
• If entry resembles (0,S →  •, n) then input parsed successfully
• But … note that chart will also contain a record of all possible parses
of input string, given the grammar -- not just the successful one(s)
Earley Parsing Steps
• Start State: (0, S’ →•S, 0)
• End State: (0, S→•, n) n is the input size
• Next State Rules
• Scanner: read input
➢(i, A→•wj+1 j) → (i, A→wj+1• j+1)
• Predictor: add top-down predictions
➢(i, A→•B j) → (j, B→• j) if B→ (note B is left-most non-
terminal)
• Completer: move dot to right when new constituent found
➢(i, B→•A k) (k, A→•, j) → (i, B→A• j)
• No backtracking and no states removed: keep complete history of parse
Earley Parser Steps
Scanner Predictor Completer

When does it Applied when Applied when non- Applied when dot reaches
apply terminals are to the terminals are to the the end of a rule
right of a dot right of a dot
(1, NP → Det Nom •, 3)
(0, VP → • V NP, 0) (0, S → • VP ,0)

What chart cell is New states are added New states are added New states are added to
affected to the next cell to current cell current cell

What contents in Move the dot over the One new state for each One state for each rule
the chart cell terminal expansion of the non- “waiting” for the
terminal in the constituent such as
(0, VP → V • NP, 1)
grammar
(0, VP → V • NP, 1)
(0, VP → • V, 0)
(0, VP → V NP •, 3)
(0, VP → • V NP, 0)
Book that flight (Chart [0])
• Seed chart with top-down predictions for S from grammar

→• [0,0] Dummy start state


S → • NP VP [0,0] Predictor
S → • Aux NP VP [0,0] Predictor
S → • VP [0,0] Predictor
NP → • Det Nom [0,0] Predictor
NP → • PropN [0,0] Predictor
VP → • V [0,0] Predictor
VP → • V NP [0,0] Predictor
CFG for Fragment of English
S → NP VP Det → that | this | a
S → Aux NP VP N → book | flight | meal | money
S → VP V → book | include | prefer
NP → Det Nom Aux → does
Nom → N
Nom → N Nom Prep →from | to | on
NP →PropN PropN → Houston |
VP → V TWA→ Nom PP
Nom
VP → V NP PP → Prep NP
Chart[1]
V→ book • [0,1] Scanner
VP → V • [0,1] Completer
VP → V • NP [0,1] Completer
S → VP • [0,1] Completer
NP → • Det Nom [1,1] Predictor
NP → • PropN [1,1] Predictor

V→ book • passed to Completer, which finds 2 states in Chart[0]


whose left corner is V and adds them to Chart[1], moving dots to
right
Probabilistic Context-Free Grammars
Probabilistic Context-Free Grammars (PCFGs)
• Dealing with ambiguity: Probabilistic Context-Free Grammars (PCFGs)
• A probabilistic context-free grammar (PCFG) is a CFG where each rule
NT → β (where β is a symbol sequence) is assigned a probability
P(β|NT).
• The sum over all expansions of NT must equal 1: ∑β’ P(β’|NT) = 1.
• Easiest way to create a PCFG from a treebank: MLE
– Count all occurrences of NT → β in treebank.
– Divide by the count of all rules whose LHS is NT to get P(β|NT)
– P(NT −→ C1,C2 ...Cn|NT) = count(NT−→C1,C2...Cn) /count(NT)
• But as usual many rules have very low frequencies, so MLE isn’t good
enough and we need to smooth.
Probabilistic Context-Free Grammars (PCFGs)
Statistical disambiguation example

kids saw birds with fish

P(t2) = 1.0 · 0.1 · 0.3 · 0.7 · 1.0 · 0.18


• P(t1) = 1.0 · 0.1 · 0.7 · 1.0 · 0.4 · 0.18 · 1.0 · 1.0 · 0.18 = 0.0009072 · 1.0 · 1.0 · 0.18 = 0.0006804

which is less than P(t1) = 0.0009072, so t1 is preferred. Yay!


Probabilistic CKY
Input: POS-tagged sentence
John_N eats_V pie_N with_P cream_N

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy