0% found this document useful (0 votes)
6 views

Unit 3 CFG

Uploaded by

jerryt2714
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Unit 3 CFG

Uploaded by

jerryt2714
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

Theory of Computation

Unit-3 Context Free Grammar

PPT

Prepared By

Archana Kadam
Assistant Professor, 1
Unit-3 Context-Free Grammar
Introduction, Regular Grammar, Context Free Grammar- Definition,
Derivation. Sentential form, parse tree, Ambiguous Grammar,
Simplification of CFG: Eliminating unit productions, useless production,
useless symbols, Greibach normal form, Chomsky normal form. Types of
Grammar: Chomsky Hierarchy, Context Free Language (CFL): Closure
properties of CFL.
Case Study- CFG for Parenthesis Match- XML and Document Type
Definitions, Natural Language Processing- Text Parsing
Definition
■ A context-free grammar (CFG) G is a
quadruple (V, T, P, S) where
■ V: a set of non-terminal symbols
■ T: a set of terminals (V ∩ T = Ǿ)
■ P: a set of rules (P: V → (V U T)*)
■ S: a start symbol.
Example
■ V = {q, f,}
■ T = {0, 1}
■ P = {q → 11q, q → 00f,
f → 11f, f → ε }
■ S=q
■ (R= {q → 11q | 00f, f → 11f | ε })
How do we use rules?
■ If A → B, then xAy xBy and we say that
xAy derivates xBy.

■ If s ··· t, then we write s * t.


■ A string x in Σ* is generated by G=(V,Σ,R,S)
if S * x.
■ L(G) = { x in Σ* | S * x}.
Example
■ G = ({S}, {0,1}. {S → 0S1 | ε }, S)
■ ε in L(G) because S -> ε .
■ 01 in L(G) because S -> 0S1-> 01.
■ 0011 in L(G) because
S -> 0S1 -> 00S11 -> 0011.
■ 0 1 in L(G) because S -> * 0 1 .
■ L(G) = {0 1 | n > 0}
Context-Free Language (CFL)
■ A language L is context-free if there
exists a CFG G such that L = L(G).
■ Example are palindrome
■ L(G) is
■ S-> aSa | bSb | a | b
Corollary
■ Every regular language is a CFL.
■ The class of regular languages is a
proper subclass of CFLs.

CFL
Regula
r
Context Free Grammar
■ Construct the CFG for L ={ an bn | n>=1}
■ L = {ab, aabb, aaabbb,…..}
P : S-> aSb | ab
Construct the CFG for L of palindrome
strings. S - > aSa | bSb | Ɛ

9
Derivations
■ There are three ways to derive a string
from grammar
1. Leftmost Derivation : always deriving leftmost
non-terminal first

2. Rightmost Derivation : always deriving


rightmost non-terminal first
3. Derivation Tree / Parse Tree

10
Derivations Examples
■ Consider following CFG , G= {(S,A), (a,b), P, S}
■ Where P: S-> aAS | a
■ A-> SbA | SS | ba

■ Derive string aabbaa using leftmost and rightmost derivation


■ First number the production rules
■ 1. S -> aAS Left most derivation is
■ 2. S -> a S-> aAS…………Rule 1
-> aSbAS……...Rule 3
■ 3. A -> SbA ->aabAS……….Rule 2
■ 4. A -> SS ->a a b b a S….Rule 5
-> a a b b a a…...Rule 2
■ 5. A -> ba
11
Derivations Examples
■ Consider following CFG , G= {(S,A), (a,b), P, S}
■ Where P: S-> aAS | a
■ A-> SbA | SS | ba

■ Derive string aabbaa using leftmost and rightmost derivation


■ First number the production rules
■ 1. S -> aAS Right most derivation is
■ 2. S -> a S-> aAS…………Rule 1
-> aAa….……...Rule 2
■ 3. A -> SbA ->aSbAa……….Rule 3
■ 4. A -> SS ->a S b b a a….Rule 5
-> a a b b a a…...Rule 2
■ 5. A -> ba
12
Derivations Examples
■ Derive string aabbaa using derivation tree
■ First number the production rules
■ 1. S -> aAS
■ 2. S -> a Derivation Tree
S
■ 3. A -> SbA
■ 4. A -> SS a
A
S
■ 5. A -> ba
A a
S
b

b b a

13
Ambiguous Context Free Grammar
■ A CFG for a language is said to be
ambiguous, if there exists at least one
string which can be generated more
than one ways.
■ There exists more than one leftmost
derivation, rightmost derivation and
more than one derivation tree.

14
Ambiguous Context Free Grammar
■ Example :
■ E -> E + E | E * E | id

Now to derive a string id + id * id there are two leftmost


derivation given below
E => E + E E => E * E
=> id + E => E + E * E
=> id + E * E => id + E * E
=> id + id * E => id + id * E
=> id + id * id => id + id * id

So this grammar is ambiguous


15
Ambiguous Context Free Grammar
■ Example :
Check the ambiguity in following grammar
S -> iCtS | iCtSeS | a
C -> b
Consider string ‘ibtibtaea’

16
Ambiguous Context Free Grammar
S -> iCtS | iCtSeS | a
C -> b
Consider string ‘ibtibtaea’

S -> iCtS S -> iCtSeS


-> ibtS -> ibtSeS
-> ibtiCtSeS -> ibtiCtSeS
-> ibtibtaea ->ibtibtaea

So this grammar is ambiguous


17
Ambiguous Context Free Grammar
Check whether the grammar is ambiguous or not.
A->AA
A->(A)
A->a
For the string "a(a)(a)a"

18
As two derivation tree Grammar is Ambiguous
19
Removal of Ambiguity
Example :
E -> E + E | E * E | id
Postpone higher priority operators from start symbol
Use other Non terminals to take care of it
E-> E + T | T
T->T*F|F
F -> id

20
CFG Simplification

21
Three ways to simplify/clean a CFG
(clean)
1. Eliminate useless symbols

(simplify)
2. Eliminate ε-productions A => ε

3. Eliminate unit productions A => B

22
Eliminating useless symbols
A symbol X is reachable if there exists:
■ S 🡺* α X β

A symbol X is generating if there exists:


■ X 🡺* w,
■ for some w ∈ T*

For a symbol X to be “useful”, it has to be both


reachable and generating
■ S 🡺* α X β 🡺* w’, for some w’ ∈ T*

reachable generating

23
Algorithm to detect useless
symbols
1. First, eliminate all symbols that are not
generating

2. Next, eliminate all symbols that are not


reachable

24
Example: Useless symbols
■ S🡺AB | a
■ A🡺 b

1. A, S are generating
2. B is not generating (and therefore B is useless)
3. ==> Eliminating B… (i.e., remove all productions that involve B)
1. S🡺 a
2. A🡺b
4. Now, A is not reachable and therefore is useless

5. Simplified G: S 🡺 a

25
What’s the point of removing ε-productions?
A🡺ε
Eliminating ε-productions

Theorem: If G=(V,T,P,S) is a CFG for a language L, then


L\ {ε} has a CFG without ε-productions

Definition: A is “nullable” if A🡺* ε


■ If A is nullable, then any production of the form
“B🡺 CAD” can be simulated by:
■ B 🡺 CD | CAD
■ This can allow us to remove ε transitions for A

26
Algorithm to detect all nullable
variables
■ Basis:
■ If A🡺 ε is a production in G, then A is
nullable
(note: A can still have other productions)
■ Induction:
■ If there is a production B🡺 C1C2…Ck,
where every Ci is nullable, then B is also
nullable

27
Eliminating ε-productions
Given: G=(V,T,P,S)
Algorithm:
1. Detect all nullable variables in G
2. Then construct G1=(V,T,P1,S) as follows:
i. For each production of the form: A🡺X1X2…Xk, where
k≥1, suppose m out of the k Xi’s are nullable symbols
ii. Then G1 will have 2m versions for this production
i. i.e, all combinations where each Xi is either present or absent
iii. Alternatively, if a production is of the form: A🡺ε, then
remove it

28
Example: Eliminating
ε-productions
■ Let L be the language represented by the following CFG G:
i. S🡺AB
ii. A🡺aAA | ε
iii. B🡺bBB | ε Simplified
grammar
Goal: To construct G1, which is the grammar for L-{ε}

■ Nullable symbols: {A, B}

■ G1 can be constructed from G as follows: G 1:


■ B 🡺 b | bB | bB | bBB • S 🡺 A | B | AB
■ ==> B 🡺 b | bB | bBB • A 🡺 a | aA | aAA
■ Similarly, A 🡺 a | aA | aAA • B 🡺 b | bB | bBB
■ Similarly, S 🡺 A | B | AB
+
■ Note: L(G) = L(G1) U {ε} • S🡺ε

29
Eliminating unit productions

A => B B has to be a variable

What’s the point of removing unit transitions ?


Will save #substitutions
E.g., A=>B | … A=>xxx | yyy | zzz | …
B=>C | … B=> xxx | yyy | zzz | …
C=>D | … C=> xxx | yyy | zzz | …
D=>xxx | yyy | zzz D=>xxx | yyy | zzz
after 30
before
Putting all this together…
■ Theorem: If G is a CFG for a language that
contains at least one string other than ε, then there
is another CFG G1, such that L(G1)=L(G) - ε, and
G1 has:
■ no ε -productions
■ no unit productions
■ no useless symbols

■ Algorithm:
Step 1) eliminate ε -productions Again,
Step 2) eliminate unit productions the order is
Step 3) eliminate useless symbols important!

Why?
31
Normal Forms

32
Why normal forms?
■ If all productions of the grammar could be
expressed in the same form(s), then:

a. It becomes easy to design algorithms that use


the grammar

b. It becomes easy to show proofs and properties

33
Chomsky Normal Form (CNF)
Let G be a CFG for some L-{ε}
Definition:
G is said to be in Chomsky Normal Form if all
its productions are in one of the following
two forms:
i. A 🡺 BC where A,B,C are variables, or
ii. A🡺a where a is a terminal
■ G has no useless symbols
■ G has no unit productions
■ G has no ε-productions
34
CNF checklist
Is this grammar in CNF?
G1:
1. E 🡺 E+T | T*F | (E) | Ia | Ib | I0 | I1
2. T 🡺 T*F | (E) | Ia | Ib | I0 | I1
3. F 🡺 (E) | Ia | Ib | I0 | I1
4. I 🡺 a | b | Ia | Ib | I0 | I1

Checklist:
• G has no ε-productions
• G has no unit productions
• G has no useless symbols
• But…
• the normal form for productions is violated

So, the grammar is not in CNF


35
How to convert a G into CNF?
■ Assumption: G has no ε-productions, unit productions or useless
symbols

1) For every terminal a that appears in the body of a production:


i. create a unique variable, say Xa, with a production Xa 🡺 a, and
ii. replace all other instances of a in G by Xa

2) Now, all productions will be in one of the following


two forms:
■ A 🡺 B1B2… Bk (k≥3) or A🡺a

4) Replace each production of the form A 🡺 B1B2B3… Bk by:and so on…


B2 C2
B1 C1

■ A🡺B1C1 C1🡺B2C2 … Ck-3🡺Bk-2Ck-2 Ck-2🡺Bk-1Bk


36
Example #1
G in CNF:
G:
X0 => 0
S => AS | BABC
X1 => 1
A => A1 | 0A1 | 01
S => AS | BY1
B => 0B | 0
Y1 =>
C => 1C | 1 AY2
Y =>AX
A2=> BC1 | X0Y3 | X0X1
Y3 => AX1
B => X0B | 0
C => X1C | 1

All productions are of the form: A=>BC or A=>a

37
Example #2
G: 1. E 🡺 EX+T | TX*F | X(EX) | IXa | IXb | IX0 | IX1
1. E 🡺 E+T | T*F | (E) | Ia | Ib | I0 | I1 2. T 🡺 TX*F | X(EX) | IXa | IXb | IX0 | IX1
2. T 🡺 T*F | (E) | Ia | Ib | I0 | I1 3. F 🡺 X(EX) | IXa | IXb | IX0 | IX1
3. F 🡺 (E) | Ia | Ib | I0 | I1 4. I 🡺 Xa | Xb | IXa | IXb | IX0 | IX1
4. I 🡺 a | b | Ia | Ib | I0 | I1 Step (1) 5. X+ 🡺 +
6. X* 🡺 *
7. X+ 🡺 +
8. X( 🡺 (
)
(2 9. …….
ep
St

1. E 🡺 EC1 | TC2 | X(C3 | IXa | IXb | IX0 | IX1


2. C 1 🡺 X +T
3. C 2 🡺 X *F
4. C3 🡺 EX)
5. T 🡺 ..…….
6. ….

38
Other Normal Forms
■ Griebach Normal Form (GNF)
■All productions of the form
A==>a α
Where a is terminal and α is string of zero or
more non-terminals

39
Greibach Normal Form (GNF)
■ Two ways to handle to convert grmmar in GNF
■ Theorem 1: By substitution Method
■ Example:
A->ABA | AB
A-> aA | a
Using theorem 1 of substitution for B in production of first
grammsr
A-> aABA | aBA | aAB| aB
Now all grammar production are in required format.
(Kindly note do not replace for grammar production already in format)

40
Greibach Normal Form (GNF)
■ Two ways to handle to convert grammar in GNF
■ Theorem 2:is used to handle left recursive

grammar
■ If CFG consist of production of the form

A -> Aα1 | Aα2 |….| A αr | β1 | β2 |….| βn

Then equivalent grammar can be formed as


A ->
β1 | β2 |….| βn
A -> β1 Z | β2 Z|….| βn Z
Z -> α1 | α2 |….| αr
Z -> α1 Z | α2 Z |….| αr Z

41
Greibach Normal Form (GNF)
Now A’s production are in
Example : GNF.
A -> AB| aB | bB| c Need to handle Z, will apply
theorem one of substitution
B -> b So,
To bring this grammar in GNF will apply Z-> b | bZ
theorem 2 to handle left recursive
grammar of A -> AB
Final Answer is
Here, α1 = B,
β1 = aB, β2 = bB, , β3 = c A -> aB | bB| c
A -> | aBZ | bBZ| cZ
Then,
Z -> b | bZ
A -> aB | bB| c
A -> | aBZ | bBZ| cZ
Z -> B | BZ
42
Chomsky Hierarchy

26-09-2023 43
Introduction
■ Different classes of phrase structure
grammar can be obtained with few
constraints
■ It is a containment hierarchy
■ Described by Chomsky-Schitzenberger
■ Suggested four different classes
Type-0 (Unrestricted grammar)
Type-1 (Context sensitive grammar)
Type-2 (Context free grammar)
Type-3 (Regular grammar)
26-09-2023 44
Type-0 (Unrestricted grammar)

■ No restriction on production rules


■ Production of form α −> β , where α ≠ ε
■ Semi-thue grammar or unrestricted grammar
■ Generate languages are recognized by
Turing machines (recursively enumerable
languages)
■ For example: A −>AB, AB −>BC, B −>AcD,
Ac −>abc, D −> ε
26-09-2023 45
Type-1 (Context sensitive grammar)
Restriction on production rules are as follows:
1. for α −> β, the length of β is at least as much as
length of α, except for S −> ε
2. The rule S −> ε is allowed only if S doesn’t appear on
RHS
3. Context sensitive because of productions of form
α1Aiα2 −> α1βα2 , β ≠ ε and α1 , α2 may or may not be
empty
Context sensitive language (CSL) recognized by turing
machine
Example: A −>aBC, aB −>ab, bC−>bc
Derivation:
A=>aBC=>abC ;
abC=>abc ; finally A=>abc
26-09-2023 46
Type-2 (Context free grammar)

■ Production of form A−> α , where α


is sentential form
■ Context free languages (CFL)
generated by context free grammar
are recognized by Push Down
Automata (PDA)
■ For example:
■ S −>aSb, S −>bSb, S −>a, S −>b

26-09-2023 47
Type-3 (Regular grammar)

■ The restriction on production rules are as follows:


1. Left-hand side (LHS) should contain only one
non-terminal (NT) symbol
2. Right-hand side (RHS) can contain at most one NT
symbol, which should be rightmost or leftmost
symbol
■ Regular language by regular grammar , which are
recognized by finite state machines (FSM), and
denoted by regular expressions
■ Based on position of RHS non-terminal symbol, the
regular grammar is classified as left-linear or
right-linear grammar

26-09-2023 48
Type-3 (Regular grammar)

1. Left-linear grammar:
2. allowed productions are: A −>Bw , A −>w,
■ The rule S −> ε is allowed only if S doesn’t
appear on RHS
■ For example:
S −>Ca|Bb ,
C −>Bb ,
B −>Ba|b

26-09-2023 49
Type-3 (Regular grammar)

2. Right-linear grammar:
■ allowed productions are:
■ A −>wB , A −>w,
■ The rule S −> ε is allowed only if S
doesn’t appear on RHS
■ For example: S −>0A, A −>0A | 1

26-09-2023 50
CFL Closure Properties

51
Closure Property Results
■ CFLs are closed under:
■ Union
■ Concatenation
■ Kleene closure operator
■ Substitution
■ Homomorphism, inverse homomorphism
■ reversal
■ CFLs are not closed under:
■ Intersection
■ Difference
■ Complementation
52
Substitution of a CFL:
example
■ Let L = language of binary palindromes s.t., substitutions for 0
and 1 are defined as follows:
■ s(0) = {anbn | n ≥1}, s(1) = {xx,yy}
■ Prove that s(L) is also a CFL.

CFG for L: CFG for s(0): CFG for s(1):

S=> 0S0|1S1|ε S0=> aS0b | ab S1=> xx | yy

Therefore, CFG for s(L):

S=> S0SS0 | S1 S S1 |ε
S0=> aS0b | ab
S1=> xx | yy 53
CFLs are closed under union
Let L1 and L2 be CFLs
To show: L2 U L2 is also a CFL
Let us show by using the result of Substitution

■ Make a new language:


■ Lnew = {a,b} s.t., s(a) = L1 and s(b) = L2
==> s(Lnew) == same as == L1 U L2

■ A more direct, alternative proof


■ Let S1 and S2 be the starting variables of the
grammars for L1 and L2
■ Then, Snew => S1 | S2
54
CFLs are closed under concatenation

Let us show Let L and L be CFLs


■ by using the result of Substitution
1 2

■ Make Lnew= {ab} s.t.,


s(a) = L1 and s(b)= L2
==> L1 L2 = s(Lnew)
■ L1 grammar is S-> aSa|bSb|a|b|Ɛ

■ L2 grammar is S-> aSb |Ɛ

S->S1.S2
S1-> aSa|bSb|a|b|Ɛ
S2-> aSb |Ɛ 55
CFLs are closed under
Kleene Closure
■ Let L be a CFL
■ Let Lnew = {a}* and s(a) = L1
■ Then, L* = s(L )
new

■ Let L ={a}, S->a


■ S->S1S | Ɛ
■ S1->a

56
We won’t use substitution to prove this result

CFLs are closed under


Reversal
■ Let L be a CFL, with grammar
G=(V,T,P,S)
■ For LR, construct GR=(V,T,PR,S) s.t.,
■ If A==> α is in P, then:
■ A==> αR is in PR
■ (that is, reverse every production)
■ S-> aSb
■ SR -> bSa
57
Some negative closure results

CFLs are not closed under


Intersection
■ Existential proof:
■ L1 = {0n1n2i | n≥1,i≥1}
■ L2 = {0i1n2n | n≥1,i≥1}
■ Both L1 and L2 are CFLs
■ S-> AB
■ A->0A1 | 01
■ B->2B|2
■ Therefore, L1 ∩ L2 will be 0n1n2n
■ Not possible to construct grammar so CFL’s
are not closed under intersection

58
Some negative closure results

CFLs are not closed under


complementation
■ Follows from the fact that CFLs are not
closed under intersection

■ L1 ∩ L 2 = L 1 U L 2
Logic: if CFLs were to be closed under complementation
🡺 the whole right hand side becomes a CFL (because
CFL is closed for union)
🡺 the left hand side (intersection) is also a CFL
🡺 but we just showed CFLs are
NOT closed under intersection!
🡺 CFLs cannot be closed under complementation.
59
Some negative closure results

CFLs are not closed under


difference
■ Follows from the fact that CFLs are not
closed under complementation

■ Because, if CFLs are closed under


difference, then:
■ L = ∑* - L
■ So L has to be a CFL too
■ Contradiction

60
Decision Properties
■ Emptiness test
■ Generating test
■ Reachability test
■ Membership test
■ PDA acceptance

61
Application of CFL
■ Compiler –Parsing
■ Eg a=b;
■ <STMT> -> <Assign STMT> | <If STMT>
|<while loop>|<for loop>
■ <while loop>-> while <condition><block of stmt>
■ <block of stmt> -> <List of STMT>|<STMT>
■ <Assign STMT> -> <ID>=<ID>

62
Application of CFL
■ Markup Languages
■ DOC -> Element DOC
■ Element -> Text | <EM> DOC </EM> | <P>
DOC </P> | <OL> List </OL>

■ XML
■ DTD

63
memo :
addressee: sender date title body;
addressee : TEXT;
sender : TEXT;
date : DATE;
title : TEXT;
body : TEXT;
For example, the following would be a valid memo:
<memo> <addressee>John</addressee>
<sender>Carla</addressee> <date>1998-09-01</date>
<title>New coffee maker</title>
<body> The new coffee maker has been installed! Operation is
simple: put a cup in the opening and press the red button.
</body>
</memo>
64
“Undecidable” problems for
CFL
■ Is a given CFG G ambiguous?
■ Is a given CFL inherently ambiguous?
■ Is the intersection of two CFLs empty?
■ Are two CFLs the same?
■ Is a given L(G) equal to ∑*?

65

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy