21cse356t Nlp Unit 2
21cse356t Nlp Unit 2
Definition:
A context-free grammar (CFG) is a list of rules that define the set of all well-formed
sentences in a language. Each rule has a left-hand side, which identifies a syntactic
category, and a right-hand side, which defines its alternative component parts, reading
from left to right.
Definition:
A context-free grammar consists of a set of rules or productions, each of which expresses
the ways that symbols of the language can be grouped and ordered together, and a lexicon
of words and symbols.
Context Free Grammar (CFG) - Formal Definition
• Context-free grammar G is a 4-tuple.
G = (V, T, S, P)
• These parameters are as follows;
V – Set of variables (also called as Non-terminal symbols)
T – Set of terminal symbols (lexicon)
• The symbols that refer to words in a language are called terminal symbols.
• Lexicon is a set of rules that introduce these symbols.
S – Designated start symbol (one of the non-terminals, S ∈ V)
P – Set of productions (also called as rules).
• Each rule in P is of the form A → s, where
• A is a non-terminal (variable) symbol.
• Each rule can have only one non-terminal symbol on the left hand side of the rule.
• s is a sequence of terminals and non-terminals. It is from (T U V)*, infinite set of strings.
A grammar G generates a language L.
Example 1
CFG
S → NP VP
NP → Det N | Det N PP | Pronoun | ProperNoun
VP → V NP | V NP PP | VP
P → P NP
Det → "the" | "a" | "an“
N → "cat" | "dog" | "book" | "city“
Pronoun → "he" | "she" | "it“
ProperNoun → "Alice" | "Bob“
V → "sees" | "likes" | "reads“
P → "in" | "on" | "with"
Example 1.a. Parse Tree:
Sentence: "The cat sees a dog."
• For the sentence "The cat 1.Start with the start symbol:
sees a dog", the parse tree S
is as follows: 2.Expand using the rule S → NP VP:
NP VP
3.Expand NP using NP → Det N:
Det N VP
4.Expand Det to "the" and N to "cat":
"the" "cat" VP
5.Expand VP using VP → V NP:
"the" "cat" V NP
6.Expand V to "sees" and NP to Det N:
"the" "cat" "sees" Det N
7.Expand Det to "a" and N to "dog":
"the" "cat" "sees" "a" "dog"
The derivation is complete.
• Example 1.b. : The sentence "Alice reads a book in the city", the
grammar generates the following parse tree:
Example 2:
➢ Interjections (Interj):
•Interj → wow | oh | hey: Interjections express emotion or sentiment.
•Interj → wow | oh | hey
•Example: "Wow (Interj), what a beautiful sunset (NP)!"
Grammar Rules For English
➢ Determiners (Det):
•Det → a | an | the: Determiners are words that introduce nouns and help to specify
them.
•Det → a | an | the
•Example: "An (Det) apple (N) fell from the tree (PP)."
•
➢ Adjectives (Adj):
•Adj → tall | blue | beautiful: Adjectives modify nouns and provide additional
information about them.
•Adj → tall | blue | beautiful
• Example: "The (Det) tall (Adj) building (N) is located in the city (PP)."
Treebanks
Corpus
•A corpus is a large and structured set of machine-readable texts that have been produced in
a natural communicative setting. Its plural is corpora. They can be derived in different ways like text that was
originally electronic, transcripts of spoken language and optical character recognition, etc.
Treebank
•Treebank is a corpus in which each sentence is annotated with a parse tree. It represents syntactic and semantic
relations of words in a sentence. Treebanks are created by
• Human annotations
TreeBank Corpus
✓ It may be defined as linguistically parsed text corpus that annotates syntactic or semantic sentence structure.
✓ Geoffrey Leech coined the term „treebank‟, which represents that the most common way of representing the
grammatical analysis is by means of a tree structure.
✓ Generally, Treebanks are created on the top of a corpus, which has already been annotated with part-of-speech
tags.
Example: The Penn Treebank Project:
✓ It has 36 PoS tags and 12 other tags for punctuation and symbols.
• Data in the Penn Treebank are stored in separate files for different layers of annotation.
Top Down Parsing
Top Down Parsing
• Top-down parsing is goal-directed.
• It starts at the root node of the syntax tree (representing the start symbol) and works
downward to the leaves, attempting to match the given input string using grammar
rules.
S → NP VP
S → Aux NP VP
S → VP
NP → Pronoun
Pronoun → I | he | she | me NP → Proper-Noun
NP → Det Nominal
NP → Pronoun
NP → Proper-Noun
NP → Det Nominal
Example 1 – top down
S → NP VP S → NP VP
S → Aux NP VP
S → Aux NP VP S → VP
S → VP
VP → Verb
VP → Verb NP Verb → book |
VP → VP PP include | prefer
Example 1– top down
NP → Pronoun
NP → Proper-Noun
NP → Det Nominal
NP → Pronoun
NP → Proper-Noun
NP → Det Nominal Proper-Noun → Houston | NWA
Pronoun → I | he | she | me
Example 1– top down
2. Expand NP
Expand NP→N and match N→"John“
Example 2
Example 3– top down parsing
S -> NP VP VP -> V NP | V NP PP
PP -> P NP V -> "saw" | "ate" | "walked"
NP -> "John" | "Mary" | "Bob" | Det N | Det N PP
Det -> "a" | "an" | "the" | "my"
N -> "man" | "dog" | "cat" | "telescope" | "park"
P -> "in" | "on" | "by" | "with" """)
EXAMPLE 3
S -> NP VP VP -> V NP | V NP PP
PP -> P NP V -> "saw" | "ate" | "walked"
NP -> "John" | "Mary" | "Bob" | Det N | Det N PP
Det -> "a" | "an" | "the" | "my"
N -> "man" | "dog" | "cat" | "telescope" | "park"
P -> "in" | "on" | "by" | "with" """)
EXAMPLE 3
S -> NP VP VP -> V NP | V NP PP
PP -> P NP V -> "saw" | "ate" | "walked"
NP -> "John" | "Mary" | "Bob" | Det N | Det N PP
Det -> "a" | "an" | "the" | "my"
N -> "man" | "dog" | "cat" | "telescope" | "park"
P -> "in" | "on" | "by" | "with" """)
Problems with top-down parsing
• Left recursive rules... e.g. NP → NP PP... lead to infinite recursion
• Will do badly if there are many different rules for the same LHS. Consider if
there are 600 rules for S, 599 of which start with NP, but one of which
starts with a V, and the sentence starts with a V.
• Useless work: expands things that are possible top-down but not there (no
bottom-up evidence for them).
• Top-down parsers do well if there is useful grammar-driven control: search
is directed by the grammar.
• Top-down is hopeless for rewriting parts of speech (preterminals) with
words (terminals). In practice that is always done bottom-up as lexical
lookup.
• Repeated work: anywhere there is common substructure.
Bottom-up parsing
• Bottom-up parsing begins with the words of input and attempts to create trees from the words up,
again by applying grammar rules one at a time.
• The parse is successful if it builds a tree rooted in the start symbol S that includes all of the input.
Bottom-up parsing is a type of datadriven search. It attempts to reverse the manufacturing
process and return the phrase to the start symbol S.
• It reverses the production to reduce the string of tokens to the beginning Symbol, and the string
is recognized by generating the rightmost derivation in reverse.
• The goal of reaching the starting symbol S is accomplished through a series of reductions; when
the right-hand side of some rule matches the substring of the input string, the substring is
replaced with the left-hand side of the matched production, and the process is repeated until the
starting symbol is reached.
2.
Bottom-up parsing example
Example 1: Bottom-up parsing example
Det → the | a | that | this
(Breadth-first)
• S → NP VP
• S → Aux NP VP Noun → book | flight | meal | money
• S → VP Verb → book | include | prefer
• NP → Pronoun Pronoun → I | he | she | me
• NP → Proper-Noun Proper-Noun → Houston | NWA
Aux → does
• NP → Det Nominal
Prep → from | to | on | near | through
• Nominal → Noun
• Nominal → Nominal
Noun
• Nominal → Nominal PP
• VP → Verb
• VP → Verb NP
• VP → VP PP
• PP → Prep NP
Bottom-up parsing example
Input sentence
Nominal → Noun
Bottom-up parsing example
Appending noun
to nominal
Nominal → Noun
Noun → book | flight | meal | money Nominal → Nominal Noun
Nominal → Nominal PP
Bottom-up parsing example
PP cannot be expanded
S → NP VP as no input remaining
Bottom-up parsing example
VP → Verb S → VP
VP → Verb NP
Bottom-up parsing example
VP → Verb
No rule for PP-> VP → Verb NP
Det NP
Bottom-up parsing example
Top-Down Parsing Bottom-Up Parsing
It is a parsing strategy that first looks at the It is a parsing strategy that first looks at the lowest
highest level of the parse tree and level of the parse tree
works down the parse tree by using the rules of and works up the parse tree by using the rules of
grammar. grammar.
Bottom-up parsing can be defined as an attempt to
Top-down parsing attempts to find the left most
reduce the input string to the start symbol of a
derivations for an input string.
grammar.
In this parsing technique we start parsing from the
In this parsing technique we start parsing from
bottom (leaf node of
the top (start symbol of parse tree) to down (the
the parse tree) to up (the start symbol of the parse
leaf node of parse tree) in a top-down manner.
tree) in a bottom-up manner.
This parsing technique uses Left Most
This parsing technique uses Right Most Derivation.
Derivation.
The main leftmost decision is to select The main decision is to select when to use a
what production rule to use in order to production rule to reduce the string to get the
construct the string. starting symbol.
Example: Recursive Descent parser. Example: Its Shift Reduce parser.
Syntactic ambiguity:
Syntactic Ambiguity refers to ambiguity in sentence structure and be able to
interpret in different forms. This structural ambiguity occurs when the grammar
assigns more than one possible parse to a structure
Who is old?
This single string of words has two distinct meanings, which arise from two
different grammatical ways of combining the words in the sentence. This is
known as structural ambiguity or syntactic ambiguity.
Ambiguity
• .
Dependency relations are often based on the Universal Dependencies (UD) framework, a widely used
annotation standard.
Dependency Parsing Methods
Dependency parsing can be broadly classified into:
a)Transition-Based Parsing
•Uses a stack-based approach.
•Parses the sentence incrementally by shifting and reducing words.
b)Graph-Based Parsing
• Views dependency parsing as a graph optimization problem.
• Finds the best dependency tree for a sentence.
c)Neural Dependency Parsing
• Uses deep learning to learn dependency relations.
• Replaces feature engineering with word embeddings (Word2Vec, BERT).
Earley Parsing
Earley Parsing
• Earley Parsing is a chart-based parsing algorithm for processing
sentences in context-free grammars (CFGs). It is particularly useful
for top-down parsing and can handle any CFG, including ambiguous
and left-recursive grammars.
• Key Features
• Can parse any CFG (unlike CYK, which requires CNF).
• Handles left recursion.
• Efficient for both parsing and recognizing sentences.
When does it Applied when Applied when non- Applied when dot reaches
apply terminals are to the terminals are to the the end of a rule
right of a dot right of a dot
(1, NP → Det Nom •, 3)
(0, VP → • V NP, 0) (0, S → • VP ,0)
What chart cell is New states are added New states are added New states are added to
affected to the next cell to current cell current cell
What contents in Move the dot over the One new state for each One state for each rule
the chart cell terminal expansion of the non- “waiting” for the
terminal in the constituent such as
(0, VP → V • NP, 1)
grammar
(0, VP → V • NP, 1)
(0, VP → • V, 0)
(0, VP → V NP •, 3)
(0, VP → • V NP, 0)
Book that flight (Chart [0])
• Seed chart with top-down predictions for S from grammar