0% found this document useful (0 votes)
5 views

Lecture 7 - Context Free Grammars

Lab 1

Uploaded by

KMC PRANAV
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Lecture 7 - Context Free Grammars

Lab 1

Uploaded by

KMC PRANAV
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Theoretical Computer Science

Lecture 7:
Context-Free Grammars

Jos Uiterwijk
uiterwijk@maastrichtuniversity.nl
Contents

Book Chapter 11: Context-Free Grammars

• Section 11.1: Introduction to Rewrite Systems and


Grammars
• Section 11.2: Context-Free Grammars and
Languages
• Section 11.3: Designing Context-Free Grammars
• Section 11.4: Simplifying Context-Free Grammars
• Section 11.5: NOT
• Section 11.6: Derivations and Parse Trees
• Section 11.7: Ambiguity (not subsection 11.7.3 !)
2
Languages and Machines

3
Rewrite Systems and Grammars
A rewrite system (or production system or rule-based
system) is:

● a list of rules, and


● an algorithm for applying them.

Each rule has a left-hand side and a right hand side.

Example rules:

S ® aSb
aS ® e
aSb ® bSabSa

4
Simple-rewrite
simple-rewrite(R: rewrite system, w: initial string) =

1. Set working-string to w.

2. Until told by R to halt do:


Match the lhs of some rule against some part of
working-string.

Replace the matched part of working-string with the


rhs of the rule that was matched.

3. Return working-string.

5
A Rewrite System Formalism

A rewrite system formalism specifies:

● The form of the rules

● How simple-rewrite works (control mechanism):

• How to choose rules?


• When to quit?

6
An Example
w = SaS

Rules:
[1] S ® aSb
[2] aS ® e

● What order to apply the rules?

● To what part of w to apply the chosen rule?

● When to quit?

7
Rule-Based Systems
Examples of rule-based systems

● Expert systems

● Cognitive modeling

● Business practice modeling

● General models of computation

● Grammars (for defining languages)

8
Grammars Define Languages
A grammar is a set of rules that are stated in terms of
two alphabets:

• a terminal alphabet, S, that contains the symbols that


make up the strings in L(G), and

• a nonterminal alphabet, the elements of which will


function as working symbols that will be used while the
grammar is operating. These symbols will disappear by
the time the grammar finishes its job and generates a
string.

A grammar has a unique start symbol, often called S.

9
Using a Grammar to Derive a String
Simple-rewrite (G, S) will generate the strings in L(G).

We will use the symbol Þ to indicate steps in a


derivation.

A derivation could begin with:

S Þ aSb Þ aaSbb Þ …

10
Generating Many Strings
• Multiple rules may match.

Given: S ® aSb, S ® bSa, and S ® e

Derivation so far: S Þ aSb Þ aaSbb Þ …

Three choices at the next step:

S Þ aSb Þ aaSbb Þ aaaSbbb (using rule 1),


S Þ aSb Þ aaSbb Þ aabSabb (using rule 2),
S Þ aSb Þ aaSbb Þ aabb (using rule 3).

11
Generating Many Strings
• One rule may match in more than one way.

Given: S ® aTTb, T ® bTa, and T ® e

Derivation so far: S Þ aTTb Þ …

Four choices at the next step:

S Þ aTTb Þ abTaTb (rule 2 applied to first T)


S Þ aTTb Þ aTbTab (rule 2 applied to second T)
S Þ aTTb Þ aTb (rule 3 applied to first T)
S Þ aTTb Þ aTb (rule 3 applied to second T)

12
When to Stop
May stop when:

1. The working string no longer contains any nonterminal


symbols (including, when it is e).

In this case, we say that the working string is generated


by the grammar.

Example:
Rules: S ® aSb, S ® bTa, and S ® e

S Þ aSb Þ aaSbb Þ aabb

13
When to Stop
May stop when:

2. There are nonterminal symbols in the working string but


none of them appears on the left-hand side of any rule in
the grammar.

In this case, we have a blocked or non-terminated derivation


but no generated string.

Example:
Rules: S ® aSb, S ® bTa, and S ® e

Derivations: S Þ aSb Þ abTab Þ [blocked]

14
When to Stop
It is possible that neither (1) nor (2) is achieved.

Example:

G contains only the rules S ® Ba and B ® bB, with S the start


symbol.

Then all derivations proceed as:

S Þ Ba Þ bBa Þ bbBa Þ bbbBa Þ bbbbBa Þ ...

15
Context-free Grammars, Languages,
and PDAs

Defines Context-free
Language

Context-free
Grammar Accepts

PDA
16
More Powerful Grammars
Regular grammars must always produce strings one character at
a time, moving left to right.

But it may be more natural to describe generation more flexibly.


Example 1: L = {ab*a}
S ® aB S ® aBa
B®a vs. B®e
B ® bB B ® bB
regular non-regular
Example 2: L = {anb*an, n ³ 0}
S®B
S ® aSa
B®e
B ® bB
17
No regular grammar possible in this case!
Context-Free Grammars
Informally, context-free grammars are defined as grammars
with the following two conditions:

• No restrictions on the form of the right hand sides.


® abDeFGab

• But require single non-terminal on the left hand side.



but not ASB ®

18
Example 1: AnBn

S®e
S ® aSb

19
Example 2: Balanced Parentheses

S®e
S ® SS
S ® (S)

20
Definition: Context-Free Grammars
A context-free grammar G is a quadruple,
(V, S, R, S), where:

● V is the rule alphabet, which contains nonterminals


and terminals.
● S (the set of terminals) is a subset of V,
● R (the set of rules) is a finite subset of (V ‒ S) ´V*,
● S (the start symbol) is an element of V ‒ S.

Example:
({S, a, b}, {a, b}, {S ® aSb, S ® e}, S)

21
Derivations
We define the derives-in-one-step relation ÞG as:

x ÞG y iff x = a A b

and A ® g is in R

y=agb

w0 ÞG w1 ÞG w2 ÞG . . . ÞG wn for some n ≥ 0 is a
derivation in G.

Let ÞG* be the reflexive, transitive closure of ÞG.


Then the language generated by G, denoted L(G), is:
{w Î S* : S ÞG* w}. 22
An Example Derivation

Example:

Let G = ({S, a, b}, {a, b}, {S ® aSb, S ® e}, S)

S Þ aSb Þ aaSbb Þ aaaSbbb Þ aaabbb

S Þ* aaabbb

23
Definition of a Context-Free
Grammar
A language L is context-free iff it is generated by some
context-free grammar G.

24
Recursive Grammar Rules

• A rule is recursive iff it is X ® w1Yw2, where:


Y Þ* w3Xw4 for some w1, w2, w3, and w4 in V*.

• A grammar is recursive iff it contains at least one


recursive rule.

• Examples:
S ® (S)

S ® (T)
T ® (S)
25
Self-Embedding Grammar Rules
• A rule in a grammar G is self-embedding iff it is :
X ® w1Yw2, where Y Þ* w3Xw4 and
both w1w3 and w4w2 are in S+.
• A grammar is self-embedding iff it contains at least one
self-embedding rule.
• Examples:
S ® aSa is self-embedding

S ® aS is recursive but not self-embedding

S ® aT
T ® Sa is self-embedding
26
Where Context-Free Grammars
Get Their Power
• If a context-free grammar G is not self-embedding
then L(G) is regular.

• If a language L has the property that every grammar


that defines it is self-embedding, then L is not regular.

27
PalEven = {wwR : w Î {a, b}*}

G = {{S, a, b}, {a, b}, R, S}, where:

R = { S ® aSa
S ® bSb
S ® e }.

So, this language is context-free (but not regular)

28
Equal Numbers of a’s and b’s

Let L = {w Î {a, b}*: #a(w) = #b(w)}.

G = {{S, a, b}, {a, b}, R, S}, where:

R = { S ® aSb
S ® bSa
S ® SS
S ® e }.

Again, this language is context-free (but not regular)

29
BNF
A notation for writing practical context-free grammars
• The symbol | should be read as “or”.

Example: S ® aSb | bSa | SS | e

stands for the set {S ® aSb, S ® bSa, S ® SS, S ® e}

• Allow a nonterminal symbol to be any sequence of


characters surrounded by angle brackets.

Examples of nonterminals:

<program>
<variable> 30
BNF for a Java Fragment

<block> ::= {<stmt-list>} |


{}
<stmt-list> ::= <stmt> |
<stmt-list> <stmt>
<stmt> ::= <block> |
while (<cond>) <stmt> |
if (<cond>) <stmt> |
do <stmt> while (<cond>); |
<assignment-stmt>; |
return |
return <expression> |
<method-invocation>;

Etcetera

Note: ::= stands for the ®

31
Spam Generation

These production rules yield 1,843,200 possible spellings.

Source: How Many Ways Can You Spell V1@gra? By Brian Hayes
American Scientist, July-August 2007
http://www.americanscientist.org/issues/pub/2007/7/how-many-ways-can-you-spell-v1gra 32
HTML
<ul>
<li>Item 1, which will include a sublist</li>
<ul>
<li>First item in sublist</li>
<li>Second item in sublist</li>
</ul>
<li>Item 2</li>
</ul>
A grammar:
/* Text is a sequence of elements.
HTMLtext ® Element HTMLtext | e
Element ® UL | LI | … (and other kinds of elements that
are allowed in the body of an HTML document)
/* The <ul> and </ul> tags must match.
UL ® <ul> HTMLtext </ul>
/* The <li> and </li> tags must match.
LI ® <li> HTMLtext </li> 33
English
S ® NP VP
NP ® the Nominal | a Nominal | Nominal |
ProperNoun | NP PP
Nominal ® N | Adjs N
N ® cat | dogs | bear | girl | chocolate | rifle
ProperNoun ® Chris | Fluffy
Adjs ® Adj Adjs | Adj
Adj ® young | older | smart
VP ® V | V NP | VP PP
V ® like | likes | thinks | shots | smells
PP ® Prep NP
Prep ® with
34
Designing Context-Free Grammars

● Generate related regions together.

AnBn

● Generate concatenated regions:

A ® BC

● Generate outside in:

A ® aAb

35
Outside-In Structure and RNA Folding

36
Concatenating Independent
Languages
Let L = {anbncm : n, m ³ 0}.

The cm portion of any string in L is completely


independent of the anbn portion, so we should generate
the two portions separately and concatenate them
together.

G = ({S, N, C, a, b, c}, {a, b, c}, R, S} where:


R = { S ® NC
N ® aNb
N®e
C ® cC
C ® e }. 37
L={ a n1 b n1 a n2 b n2 ...a nk b nk : k ³ 0 and "i (ni ³ 0)}

Examples of strings in L: e, abab, aabbaaabbbabab

Note that L = {anbn : n ³ 0}*.

G = ({S, M, a, b}, {a, b}, R, S} where:

R = { S ® MS
S®e
M ® aMb
M ® e}.

38
Another Example: Unequal a’s and b’s
L = {anbm : n ¹ m}

G = (V, S, R, S), where


V = {a, b, S, A, B},
S = {a, b},
R=

S®A /* more a’s than b’s


S®B /* more b’s than a’s
A®a /* at least one extra a generated
A ® aA
A ® aAb
B®b /* at least one extra b generated
B ® Bb
B ® aBb
39
Simplifying Context-Free Grammars
G = ({S, A, B, C, D, a, b}, {a, b}, R, S), where

R=
{ S ® AB | AC
A ® aAb | e
B ® aA
C ® bCa
D ® AB }

This clearly is a context-free grammar, but how efficient


is it? Put it otherwise: can it be simplified?

40
Unproductive Nonterminals
removeunproductive(G: CFG) =
1. G¢ = G.
2. Mark every nonterminal symbol in G¢ as unproductive.
3. Mark every terminal symbol in G¢ as productive.
4. Until one entire pass has been made without any new
symbol being marked do:
For each rule X ® a in R do:
If every symbol in a has been marked as
productive and X has not yet been marked as
productive then:
Mark X as productive.
5. Remove from G¢ every unproductive symbol.
6. Remove from G¢ every rule that contains an
unproductive symbol.
7. Return G¢.
41
Simplifying Context-Free Grammars
G = ({S, A, B, C, D, a, b}, {a, b}, R, S), where

R=

{ S ® AB | AC
A ® aAb | e
B ® aA
C ® bCa
D ® AB }

Remove unproductive nonterminals: C

42
Unreachable Nonterminals
removeunreachable(G: CFG) =
1. G¢ = G.
2. Mark S as reachable.
3. Mark every other nonterminal symbol as unreachable.
4. Until one entire pass has been made without any new
symbol being marked do:
For each rule X ® aAb (where A Î V ‒ S) in R do:
If X has been marked as reachable and A has not then:
Mark A as reachable.
5. Remove from G¢ every unreachable symbol.
6. Remove from G¢ every rule with an unreachable symbol on
the left-hand side.
7. Return G¢.

43
Simplifying Context-Free Grammars
G = ({S, A, B, C, D, a, b}, {a, b}, R, S), where

R=

{ S ® AB | AC
A ® aAb | e
B ® aA
C ® bCa
D ® AB }

Remove unreachable nonterminals: D

44
Structure
Context free languages:
We care about structure of strings derived.

E + E

id E * E

3 id id

5 7

45
Derivations
To capture structure, we must capture the path we took
through the grammar. Derivations do that.

Example:

S®e
S ® SS
S ® (S)

1 2 3 4 5 6
S Þ SS Þ (S)S Þ ((S))S Þ (())S Þ (())(S) Þ (())()
S Þ SS Þ (S)S Þ ((S))S Þ ((S))(S) Þ (())(S) Þ (())()
1 2 3 5 4 6

But the order of rule application doesn’t matter. 46


Derivations
Parse trees capture essential structure:

1 2 3 4 5 6
S Þ SS Þ (S)S Þ ((S))S Þ (())S Þ (())(S) Þ (())()
S Þ SS Þ (S)S Þ ((S))S Þ ((S))(S) Þ (())(S) Þ (())()
1 2 3 5 4 6

S S

( S ) ( S )

( S ) e

e 47
Parse Trees
A parse tree, derived by a grammar G = (V, S, R, S), is
a rooted, ordered tree in which:

● Every leaf node is labeled with an element of S È {e},

● The root node is labeled S,

● Every other node is labeled with some element of:


V – S, and

● If
m is a nonleaf node labeled X and the children of m
are labeled x1, x2, …, xn, then R contains the rule
X ® x1x2…xn.
48
Structure in English
S

NP VP

Nominal V NP

Adjs N Nominal

Adj N

the smart cat smells chocolate


49
Generative Capacity
Because parse trees matter, it makes sense, given a
grammar G, to distinguish between:

● G’s weak generative capacity, defined to be the


set of strings, L(G), that G generates, and

● G’s strong generative capacity, defined to be the


set of parse trees that G generates.

50
Algorithms Care How We Search
S

S S

( S ) ( S )

( S ) e

Algorithms for generation and recognition must be


systematic. They typically use either the leftmost
derivation or the rightmost derivation. 51
Derivations of The Smart Cat
• A left-most derivation is:
S Þ NP VP Þ the Nominal VP Þ the Adjs N VP Þ
the Adj N VP Þ the smart N VP Þ the smart cat VP Þ
the smart cat V NP Þ the smart cat smells NP Þ
the smart cat smells Nominal Þ the smart cat smells N
Þ the smart cat smells chocolate

• A right-most derivation is:


S Þ NP VP Þ NP V NP Þ NP V Nominal Þ NP V N Þ
NP V chocolate Þ NP smells chocolate Þ the Nominal
smells chocolate Þ the Adjs N smells chocolate Þ
the Adjs cat smells chocolate Þ the Adj cat smells
chocolate Þ the smart cat smells chocolate

52
Ambiguity
A grammar is ambiguous iff there is at least one string
in L(G) for which G produces more than one parse tree.

For most applications of context-free grammars, this is


a problem.

53
An Arithmetic Expression Grammar
E®E+E
E®E*E
E ® (E)
E ® id

54
Even a Very Simple Grammar Can be
Highly Ambiguous
S® e
S ® SS
S ® (S)

55
Derivation is Not Necessarily Unique
This is True for Regular Languages Too
Regular Expression: Regular Grammar:
create aaa from create aaa from
(a È b)*a (a È b)*
S®a
choose a from (a È b)*, then S ® bS
choose a from (a È b)*, then S ® aS
choose a, then S ® aT
choose ε from (a È b)*. T®a
T®b
or T ® aT
T ® bT
choose ε from (a È b)*, then
choose a, then
choose a from (a È b)*, then
choose a from (a È b)*.

56
Inherent Ambiguity
Sometimes we can avoid ambiguity, but …

… some languages have the property that every


grammar for them is ambiguous. We call such
languages inherently ambiguous.

Example:

L = {anbncm: n, m ³ 0} È {anbmcm: n, m ³ 0}.

57
Inherent Ambiguity
L = {anbncm: n, m ³ 0} È {anbmcm: n, m ³ 0}.

One grammar for L has the rules:

S ® S1 | S2
S1 ® S1c | A /* Generate all strings in {anbncm}.
A ® aAb | e
S2 ® aS2 | B /* Generate all strings in {anbmcm}.
B ® bBc | e

Now consider a string of the form anbncn.

L is inherently ambiguous, since such a string can be


generated either via S ® S1 or S ® S2. We can find no
better (unambiguous) grammar! 58
(Inherent) Ambiguity
Both of the following problems are undecidable:

• Given a context-free grammar G, is G ambiguous?

• Given a context-free language L, is L inherently


ambiguous?

59

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy