Introduction To Formal Languages, Automata and Computability
Introduction To Formal Languages, Automata and Computability
Introduction To Formal Languages, Automata and Computability
Language
Strings are defined over an alphabet which is finite.
Alphabet may vary depending upon the application.
Elements of an alphabet are called symbols. Usually
we denote the basic alphabet set either as or T . For
example, the following are a few examples of an
alphabet set.
1 = {a, b}
2 = {0, 1, 2}
3 = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}.
contd.
A string or word is a finite sequence of symbols from
the alphabet, usually written as concatenated symbols
and not separated by gaps or commas. For example if
= {a, b}, a string abbab is a string or word over .
If w is a string over an alphabet , then the length of
w written as len(w) or |w| is the number of symbols it
contains. If |w| = 0, then w is called as empty string
denoted either as or .
contd.
For any word w, w = w = w. For any string
w = a1 . . . an of length n, the reverse of w is written as w R which is the string an an1 . . . a1 , where each
symbol ai belongs to the basic alphabet . A string z
that is appearing consecutively within another string w
is called a substring or subword of w. For example aab
is a substring of baabb.
contd.
The set of all strings over an alphabet is denoted by
which includes the empty string . For example for
= {0, 1}, = {, 0, 1, 00, 01, 10, 11, . . . }. Note
that is a countably infinite set. Also n denotes the
set of all strings over whose length is n. Hence
= 0 1 2 3 . . . and
+ = 1 2 3 . . . . Subsets of are called
languages. For example if = {a, b}
L1
{, a, b}
L2
L3
{w /|w|a = |w|b }
contd.
DefinitionA set is an unordered collection of objects.
contd.
DefinitionA set is an unordered collection of objects.
Example Let W denote the set of well formed
parentheses. It can be defined inductively as follows:
Basis clause : [ ] W
Inductive clause : if x, y W , xy W and [x] W
Extremal clause : No object is a member of W unless
its being so follows from a finite number of applications of the basis and the inductive clauses.
Language
Definition Let be any alphabet set. + is a set of nonempty
strings over defined as follows:
1. Basis : If a , then a + .
2. Induction : If + and a , a, a are in + .
3. No other element belong to + .
Clearly the set + contains all strings of length n, n 1.
Language
Definition Let be any alphabet set. + is a set of nonempty
strings over defined as follows:
1. Basis : If a , then a + .
2. Induction : If + and a , a, a are in + .
3. No other element belong to + .
Clearly the set + contains all strings of length n, n 1.
Definition Let be any alphabet set. is defined as follows:
1. Basis : .
2. Induction : If , a , then a, a .
3. No other element is in .
Introduction to Formal Languages, Automata and Computability p.7/74
contd.
Since languages are sets, one can define the settheoretic operations of union, intersection, difference,
complement in the usual fashion. The following operations are also defined for languages. If x = a1 . . . an ,
y = b1 . . . bm , the concatenation of x and y is defined
as xy = a1 . . . an b1 . . . bm . The catenation (or concatenation) of two languages L1 and L2 is defined by,
L1 L2 = {w1 w2 /w1 L1 and w2 L2 }. Note that
concatenation of languages is associative because concatenation of strings is associative. Also L0 = {} and
L = L = , L = L = L.
contd.
The concatenation closure (Kleene closure) of a
language L, in symbols L is defined to be the union
of all powers of L:
L =
Li
i=0
Also L+ =
Li .
i=1
contd.
The right quotient and right derivative are the
following sets respectively.
L1 \L2 = {y|yz L1 for some z L2 }
zr L = L/{z} = {y/yz L}
Similarly left quotient of a language L1 by a language
L2 is defined by
L2 /L1 = {z|yz L1 for some y L2 }.
The left derivative of a language L with respect to a
word y is denoted as y L which is equal to {z|yz L}.
Introduction to Formal Languages, Automata and Computability p.10/74
contd.
The mirror image (or reversal) of a language is the
collection of the mirror images of its words and
mir(L) = {mir(w)/w L} or LR = {wR /w L}.
The operations substitution and homomorphism are
defined as follows.
For each symbol a of an alphabet , let (a) be a
language over a . Also () = , () = ().()
+
V
for , . is a mapping from to 2 where
V is the union of the alphabets a , is called a
substitution. For a language L over , we define
(L) = {/ () for some L}.
Introduction to Formal Languages, Automata and Computability p.11/74
contd.
A substitution is -free if and only if none of the
language (a) contains . A family of languages is
closed under substitution if and only if whenever L is
in the family and is a substitution such that (a) is
in the family, then (L) is also in the family.
A substitution such that (a) consists of a single
word wa is called a homomorphism. It is called -free
homomorphism if none of (a) is .
Algebraically, one can see that is a free semigroup
with as its identity.
contd.
The homomorphism which is defined above agrees
with the customary definition of homomorphism of
one semigroup into another.
Inverse homomorphism can be defined as follows:
h1 (w) = {x|h(x) = w}
h1 (L) = {x|h(x) is in L}.
It should be noted that h(h1 (L)) need not be equal to
L. Generally h(h1 (L)) L and h1 (h(L)) L.
Grammar
Definition A phrase-structure grammar or a type 0
grammar is a 4-tuple G = (N, T, P, S), where N is a
finite set of nonterminal symbols called the
nonterminal alphabet, T is a finite set of terminal
symbols called the terminal alphabet, S N is the
start symbol and P is a set of productions (also called
production rules or simply rules) of the form u v,
where u (N T ) N (N T ) and v (N T ) .
Derivations are defined as follows:
If u is a string in (N T ) and u v is a rule in
P , from u we get v by replacing u by v. This
is denoted as u v. is read as directly
derives.
contd.
If 1 2 , 2 3 , . . . , n1 n , the derivation
is denoted as 1 2 n or 1 n . is
the reflexive, transitive closure of .
Definition The language generated by a grammar
G = (N, T, P, S) is the set of terminal strings
derivable in the grammar from the start symbol.
L(G) = {w/w T , S w}
Example
Consider the grammar G = (N, T, P, S) where
N = {S, A}, T = {a, b, c}, production rules in P are
S aSc, S aAc, A b
A typical derivation in the grammar is
S
aSc
aaScc
aaaAccc
aaabccc
Lengths
DefinitionIf the rules are of the form A ,
, (N T ) , A N , (N T )+ , the
grammar is called context-sensitive grammar.
Lengths
DefinitionIf the rules are of the form A ,
, (N T ) , A N , (N T )+ , the
grammar is called context-sensitive grammar.
DefinitionIf in the rule u v, |u| 6 |v|, the grammar
is called length increasing grammar.
Lengths
DefinitionIf the rules are of the form A ,
, (N T ) , A N , (N T )+ , the
grammar is called context-sensitive grammar.
DefinitionIf in the rule u v, |u| 6 |v|, the grammar
is called length increasing grammar.
Example Let G = (N, T, P, S) where N = {S, B},
T = {a, b, c},
P has the following rules:
1. S aSBc
2. S abc
3. cB Bc
4. bB bb
Introduction to Formal Languages, Automata and Computability p.17/74
contd..
Let us consider the language generated. The number
appearing above denotes the rule being used.
S abc; here abc L(G)
2
1
S
2
aSBc
aabcBc
aabBcc
aabbcc,
a2 b2 c2 L(G)
contd.
Similarly
1
S
1
aSBc
aaSBcBc
aaabcBcBc
aaabBccBc
aaabBcBcc
aaabBBccc
aaabbBccc
aaabbbccc, a3 b3 c3 L(G)
Introduction to Formal Languages, Automata and Computability p.19/74
contd.
In general any string of the form an bn cn will be
generated.
S an1 S(Bc)n1 (by applying rule 1 (n-1) times)
an bc(Bc)n1 (rule 2 once)
n(n 1)
n
n1 n
a bB c (by applying rule 3
times)
2
Type2 language
Definition If in a grammar, the production rules are of the form, A ,
where A N and (N T ) , the grammar is called a type 2 grammar or
context-free grammar. The language generated is called a type 2 language or
context-free language.
Type2 language
Definition If in a grammar, the production rules are of the form, A ,
where A N and (N T ) , the grammar is called a type 2 grammar or
context-free grammar. The language generated is called a type 2 language or
context-free language.
Definition If the rules are of the form A B, A , A, B N, ,
T , the grammar is called a right linear grammar or type 3 grammar and the
language generated is called a type 3 language or regular set. We can even
put the restriction that the rules can be of the form A aB, A b, where
A, B N, a T, b T . This is possible because a rule A a1 . . . ak B
can be split into A a1 B1 , B1 a2 B2 , . . . , Bk1 ak B by introducing
new nonterminals B1 , . . . , Bk .
Introduction to Formal Languages, Automata and Computability p.21/74
Example
Let G = (N, T, P, S) where N = {S}, T = {a, b}
P consists of the following rules.
1. S aS
2. S bS
3. S
This grammar generates all strings in T . For example, the string abbaab is generated as follows:
S
aS (rule 1)
abS (rule 2)
abbS (rule 2)
abbaS (rule 1)
abbaaS (rule 1)
abbaabS (rule 2)
abbaab (rule 3)
Derivation tree
We have considered the definition of a grammar and
derivation. Each derivation can be represented by a
tree called a derivation tree (sometimes called parse
tree). A derivation tree for the derivation considered
in previous example with grammar
S aSc, S aAc, A b is
S
Example
Consider the following CFG, G = (N, T, P, S),
N = {S, A, B}, T = {a, b}. P consists of the
following productions
1. S aB
2. B b
3. B bS
4. B aBB
5. S bA
6. A a
7. A aS
8. A bAA
Introduction to Formal Languages, Automata and Computability p.24/74
contd..
The derivation tree for aaabbb is as follows,
S
Example
Consider the grammar
G = ({S}, {a, b}, {S SaSbS, S SbSaS,
S }, S). The language generated by this grammar
is the same as the language generated by the grammar
G = (N, T, P, S), N = {S, A, B}, T = {a, b}, P
consists of the following productions
S aB, B b, B bS, B aBB, S bA, A
a, A aS, A bAA, except that , the empty string
is also generated here.
Example
Consider the grammar
G = ({S}, {a, b}, {S SaSbS, S SbSaS,
S }, S). The language generated by this grammar
is the same as the language generated by the grammar
G = (N, T, P, S), N = {S, A, B}, T = {a, b}, P
consists of the following productions
S aB, B b, B bS, B aBB, S bA, A
a, A aS, A bAA, except that , the empty string
is also generated here.
2
1
0
1
2
contd..
Consider a string w having equal number of as and
bs. We use induction.
Basis
|w| = 0, S
|w| = 2, it is either ab or ba, then
S SaSbS ab
S SbSaS ba.
contd..
Induction Assume that the result holds up to strings
of length k 1. Prove that the result holds for strings
of length k. Draw a graph where the x axis represents
the length of the prefixes of the given string. y axis
represents the number of as - number of bs. For the
string aabbabba, the graph will look as given in the
previous figure. For a given string w with equal
number of as and bs there are 3 possibilities.
1. The string begins with a and ends with b.
2. The string begins with b and ends with a.
3. Other two cases (begins with a and ends with a,
begins with b and ends with b)
Introduction to Formal Languages, Automata and Computability p.28/74
contd..
In the first case w = aw1 b and w1 has equal numbers
of as and bs. So we have
contd..
In this case we can have a derivation as follows:
1
S SaSbS
3
aSbS
aw10 bw2
w1 w2 .
S w follows from inductive hypothesis.
CS
Theorem Every context-sensitive language is length increasing and
conversely.
That every context-sensitive language is length increasing can be seen
from definitions.
Every length increasing language is context-sensitive can be seen from
the following construction.
Let L be a length increasing language generated by G = (N, T, P, S).
Without loss of generality, one can assume that the productions in P
are of the form X a, X X1 . . . Xm , X1 . . . Xm Y1 . . . Yn ,
2 m n, X, X1 , . . . , Xm , Y1 , . . . , Yn N , a T . Productions in
P which are already context-sensitive productions are not modified.
Hence consider, a production of the form
X1 . . . X m Y 1 . . . Y n , 2 m n
Introduction to Formal Languages, Automata and Computability p.31/74
contd..
It is replaced by the following set of context-sensitive
productions:
X1 . . . X m Z1 X2 . . . X m
Z1 X 2 . . . X m Z 1 Z2 X 3 . . . X m
..
.
Z1 Z2 . . . Zm1 Xm Z1 Z2 . . . Zm Ym+1 . . . Yn
Z1 Z2 . . . Zm Ym+1 . . . Yn Y1 Z2 . . . Zm Ym+1 . . . Yn
..
.
Y1 Y2 . . . Ym1 Zm Ym+1 . . . Yn Y1 Y2 . . . Ym Ym+1 . . . Yn
where Zk , 1 k m are new nonterminals.
Introduction to Formal Languages, Automata and Computability p.32/74
contd..
Each production that is not context-sensitive is to be
replaced by a set of context-sensitive productions as
mentioned above. Application of this set of rules has
the same effect as applying X1 . . . Xm Y1 . . . Yn .
Hence a new grammar G0 thus obtained is
context-sensitive that is equivalent to G.
Example Let L = {an bm cn dm /n, m 1}.
The type 1 grammar generating this CSL is given by
G = (N, T, P, S) with N = {S, A, B, X, Y },
T = {a, b, c, d}
contd..
and P =
S
A
B
Xb
XY
Y
aAB|aB
aAX|aX
bBd|bY d
bX
Yc
c.
contd..
Sample Derivations
S aB abY d abcd.
S aB abBd abbY dd ab2 cd2 .
S aAB aaXB aaXbY d
aabXY d
aabY cd
a2 bc2 d.
contd..
S aAB
aaAXB
aaaXXB
aaaXXbY d
aaaXbXY d
aaabXXY d
aaabXY cd
aaabY ccd
aaabcccd.
Exercise
Consider the length increasing grammar with
productions P =
S aSBc, S abc, cB Bc, bB bb. All rules
except the rule cB Bc are context-sensitive. The
following grammar is a context-sensitive grammar
equivalent to the above grammar.
S
S
C
CB
DB
DC
aSBC
abc
c
DB
DC
BC.
Introduction to Formal Languages, Automata and Computability p.37/74
Ambiguity
Definition Let G = (N, T, P, S) be a CFG. A word w
in L(G) is said to be ambiguously derivable in G, if it
has two or more different derivation trees in G.
Since the correspondence between derivation trees
and leftmost derivations is a bijection, an equivalent
definition in terms of leftmost derivations can be
given.
Ambiguity
Definition Let G = (N, T, P, S) be a CFG. A word w
in L(G) is said to be ambiguously derivable in G, if it
has two or more different derivation trees in G.
Since the correspondence between derivation trees
and leftmost derivations is a bijection, an equivalent
definition in terms of leftmost derivations can be
given.
DefinitionLet G = (N, T, P, S) be a CFG. A word w
in L(G) is said to be ambiguously derivable in G, if it
has two or more different leftmost derivations in G.
Ambiguity
Definition Let G = (N, T, P, S) be a CFG. A word w
in L(G) is said to be ambiguously derivable in G, if it
has two or more different derivation trees in G.
Since the correspondence between derivation trees
and leftmost derivations is a bijection, an equivalent
definition in terms of leftmost derivations can be
given.
DefinitionLet G = (N, T, P, S) be a CFG. A word w
in L(G) is said to be ambiguously derivable in G, if it
has two or more different leftmost derivations in G.
DefinitionA CFG is said to be ambiguous if there is a
word w in L(G) which is ambiguously derivable. Otherwise it is unambiguous.
Example
Consider the grammar G with rules S SS, S a
where S is the nonterminal and a is the terminal
symbol. L(G) = {an /n 1}.This grammar is
ambiguous as a3 has two different derivation trees as
follows
Example
Consider the grammar G with rules S SS, S a
where S is the nonterminal and a is the terminal
symbol. L(G) = {an /n 1}.This grammar is
ambiguous as a3 has two different derivation trees as
follows
S
S S
a a
Ambiguity contd..
Definition A CFL L is said to be inherently ambiguous if all the grammars
generating it are ambiguous or in other words, there is no unambiguous
grammar generating it.
Example L = {an bn cp /n, m, p 1, n = m or m = p}.
This can be looked at as L = L1 L2
L1
= {an bn cp /n, p 1}
L2
Ambiguity contd..
Definition A CFL L is said to be inherently ambiguous if all the grammars
generating it are ambiguous or in other words, there is no unambiguous
grammar generating it.
Example L = {an bn cp /n, m, p 1, n = m or m = p}.
This can be looked at as L = L1 L2
L1
= {an bn cp /n, p 1}
L2
contd..
Theorem It is undecidable to determine whether a
given CFG G is ambiguous or not.
contd..
Theorem It is undecidable to determine whether a
given CFG G is ambiguous or not.
Theorem It is undecidable to determine whether a
given CFL L is ambiguous or not.
contd..
Theorem It is undecidable to determine whether a
given CFG G is ambiguous or not.
Theorem It is undecidable to determine whether a
given CFL L is ambiguous or not.
DefinitionA CFL L is bounded if there exists strings
w1 , . . . , wk such that L w1 w2 . . . wk .
contd..
Theorem It is undecidable to determine whether a
given CFG G is ambiguous or not.
Theorem It is undecidable to determine whether a
given CFL L is ambiguous or not.
DefinitionA CFL L is bounded if there exists strings
w1 , . . . , wk such that L w1 w2 . . . wk .
TheoremThere exists an algorithm to find out whether
a given bounded CFL is inherently ambiguous or not.
contd..
Theorem It is undecidable to determine whether a
given CFG G is ambiguous or not.
Theorem It is undecidable to determine whether a
given CFL L is ambiguous or not.
DefinitionA CFL L is bounded if there exists strings
w1 , . . . , wk such that L w1 w2 . . . wk .
TheoremThere exists an algorithm to find out whether
a given bounded CFL is inherently ambiguous or not.
DefinitionLet G = (N, T, P, S) be a CFG then the degree of ambiguity of G is the maximum number of
derivation trees a string w L(G) can have in G.
Introduction to Formal Languages, Automata and Computability p.41/74
contd.
We can also use the idea of power series and find out
the number of different derivation trees a string can
have. Consider the grammar with rules
S SS, S a write an equation S = SS + a
Initial solution is S = a, S1 = a
Use this in the equation for S on the right-hand side
S2 =
=
S3 =
=
=
S1 S1 + a
aa + a.
S2 S2 + a
(aa + a)(aa + a) + a
a4 + 2a3 + a2 + a.
Introduction to Formal Languages, Automata and Computability p.42/74
contd.
S4 = S3 S3 + a
= (a4 + 2a3 + a2 + a)2 + a
= a8 + 4a7 + 6a6 + 6a5 + 5a4 + 2a3 + a2 + a.
We can proceed like this using Si = Si1 Si1 + a
In Si , upto strings of length i, the coefficient of the
string will give the number of different derivation trees
it can have in G. For example, in S4 coefficient of a4 is
5 and a4 has 5 different derivation trees in G. The coefficient of a3 is 2 - the number of different derivation
trees for a3 is 2 in G.
Simplification
Definition Let G = (N, T, P, S) be a CFG. A variable X in N is said
to be useful if and only if there is at least a string L(G) such that
S 1 X2 ,
where 1 , 2 (N T ) i.e., X is useful because it appears in at least
one derivation from S to a word in L(G). Otherwise X is useless.
Consequently the production involving X is useful.
One can understand the useful symbol concept in two steps.
Step 1 For a symbol X N to be useful it should occur in some
contd.
Step 2 Also X has to derive a string T i.e.,
X
These two conditions are necessary. But they are not
sufficient. These two conditions may be satisfied still
1 or 2 may contain a nonterminal from which a
terminal string cannot be derived. So the usefulness of
a symbol has to be tested in two steps as above.
Lemma Let G = (N, T, P, S) be a CFG such that
L(G) 6= . Then there exists an equivalent contextfree grammar G0 = (N 0 , T 0 , P 0 , S) that does not contain any useless symbol or productions.
Introduction to Formal Languages, Automata and Computability p.45/74
contd.
The context-free grammar G0 is obtained by the
following elimination procedures
I. First eliminate all those symbols X such that X
does not derive any string over T . Let
G2 = (N2 , T, P2 , S) be the grammar thus modified.
As L(G) 6= , S will not be eliminated. The following
algorithm identifies symbols not to be eliminated.
Algorithm GENERATING
Step 1 Let GEN = T ;
Step 2 If A and every symbol of belongs to
GEN , then add A to GEN .
Remove from N all those symbols that are not in the
set GEN and all the rules using them.
contd.
Let the resultant grammar be G2 = (N2 , T, P2 , S).
II. Now eliminate all symbols in the grammar G2 that are not occurring
contd.
without useless symbols and productions using them.
The equivalence of G with G0 can be easily seen since
only symbols and productions leading to derivation of
terminal strings from S are present in G0 . Hence
L(G) = L(G0 ).
Theorem Given a CFG G = (N, T, P, S). Procedure I
of the previous lemma is executed to get
G2 = (N2 , T, P2 , S) and procedure II of previous
lemma is executed to get G0 = (N 0 , T 0 , P 0 , S). Then
G0 contains no useless symbols.
Suppose G0 contains a symbol X (say) which is useless. It is easily seen that N 0 N2 , T 0 T , P 0 P2 .
Introduction to Formal Languages, Automata and Computability p.48/74
contd.
Since X is obtained after execution of II,
S 1 X2 , 1 , 2 (N 0 T 0 ) . Every symbol of
N 0 is also in N2 . Since G2 is obtained by execution of
I, it is possible to get a terminal string from every
0
symbol of N2 and hence from N : 1 w1 and
2 w2 , w1 , w2 T . Thus
S 1 X2 w1 Xw2 w1 ww2.
Clearly S is not useless as supposed. Hence G0
contains only useful symbols.
Example
Let G = (N, T, P, S) be a CFG with
N = {S, A, B, C}
T = {a, b} and
P = {S Sa|A|C, A a, B bb, C aC|B}
First GEN set will be {S, A, B, a, b}. Then
G2 = (N2 , T, P2 , S) where N2 = {S, A, B}
P = {S Sa|A, A a, B bb}
In G2 , REACH set will be {S, A, a}. Hence
G0 = (N 0 , T 0 , P 0 , S) where
N 0 = {S, A}
T 0 = {a}
P 0 = {S Sa|A, A a}
L(G) = L(G0 ) = {an |n 1}.
-rule Elimination
Definition Any production of the form A is called
contd.
Run algorithm N U LL for G and get the N U LL set.
The modification of G to get G0 = (N, T, P 0 , S) with
respect to N U LL is given below.
If A A1 . . . At P , t 1 and if n (n < t) of these
Ai s are in N U LL. Then P will contain 2n rules where
the variables in N U LL are either present or absent in
all possible combinations. If n = t then remove
A from P . The grammar G0 = (N, T, P 0 , S) thus
obtained is -free. To prove that a word w L(G) if
and only if w L(G0 ). As G and G0 do not differ in
N and T , one can equivalently show that,
A G w
Introduction to Formal Languages, Automata and Computability p.52/74
contd.
if and only if
A
G0
i.e., A
G
n1
Y1 , Y2 , . . . , Yk =G
w. Let w = w1 . . . wk
Yj wj , wj 6= . Clearly k 1 as w 6= . Hence
A X1 . . . Xm is a rule in G0 .
contd.
We can see that X1 . . . Xm G w as some Yj derive
A G X1 . . . Xm w.
contd.
Induction
n
Assume A G0 w n > 1. Then let
A 0 Y1 . . . Yk G w. For A 0 Y1 . . . Yk the
G
nullable. Hence A G Y1 . . . Yk G w1 . . . wk = w
Example
Consider G = (N, T, P, S) where
N = {S, A, B, C, D}, T = {0, 1} and
P = {S AB0C, A BC, B 1|, C D|,
D }
The set N U LL = {A, B, C, D}.
Then G0 will be with
P 0 = {S AB0C|AB0|A0C|B0C|A0|B0|0C|0,
A BC|B|C, B 1, C D}.
contd.
Let G = (N, T, P, S) be a CFG. First find the sets of pairs of
contd.
Let us consider leftmost derivations in G and G0 . If w L(G0 ), then
contd.
Hence L(G) = L(G0 ).
Example Let G = (N, T, P, S) be a CFG where
N = {X, Y }, T = {a, b} and
P = {X aX|Y |b, Y bK|K|b, K a}
U N IT P AIR =
{(X, X), (Y, Y ), (K, K), (X, Y ), (Y, K), (X, K)}
Then G0 = (N, T, P 0 , S) where
P 0 = {X aX|bK|b|a, Y bK|b|a, K a}.
Remark Since removal of -rule can introduce unit
productions, to get a simplified CFG to generate
L(G) {}, the following steps have to be used in
the order given.
contd.
1. Remove -rules
2. Remove unit-rules
3. Remove useless symbols.
(i) Remove symbols not deriving terminal
strings.
(ii) Remove symbols not reachable from S.
Example It is essential that steps 3(i) and 3(ii) have to
be executed in that order. If step 3(ii) is executed first
and then step 3(i), we may not get the required
reduced grammar. Consider the CFG
G = (N, T, P, S) where N = {S, A, B, C},
T = {a, b} and P = {S ABC|a, A a, C b}
L(G) = {a}
contd.
Applying step 3(ii) first removes nothing. Then apply
step 3(i) which removes B and S ABC leaving
S a, A a, C b. Though A and C do not
contribute to the set L(G), they are not removed.
On the other hand applying step 3(i) first, removes B,
S ABC. Afterwards apply step 3(ii), removes
A, C, A a, C b. Hence S a is the only rule
left which is the required result.
Normal form
The most popular normal forms are Weak Chomsky
Normal Form, Chomsky Normal Form, Strong
Chomsky Normal Form, Greibach Normal Form.
Definition Let G = (N, T, P, S) be a CFG. If each
rule in P is of the form A , A a or A ,
where A N , N + , a T , then G is said to be in
Weak Chomsky Normal Form (WCNF).
Example Let G = (N, T, P, S) be a CFG where
N = {S, A, B}, T = {a, b} and
P = {S ASB|AB, A a, B b}. G is in
WCNF.
contd.
Theorem For any CFG G = (N, T, P, S) there exists a
CFG G0 in WCNF such that L(G) = L(G0 ).
Let G = (N, T, P, S) be a CFG. One can construct an
equivalent CFG in WCNF as below. Let
G0 = (N 0 , T 0 , P 0 , S 0 ) be an equivalent CFG where
N 0 = N {Aa /a T }, none of Aa s belong to N .
P 0 = {A |A P and every occurrence of a
symbol from T present in is replaced by Aa , giving
} {Aa a|a T }. Clearly N 0+ and P 0 gets
the required form. G0 is in WCNF. That G and G0
equivalent can be seen easily.
Introduction to Formal Languages, Automata and Computability p.64/74
contd.
Theorem Given CFG G, there exists an equivalent CFG G00 in CNF.
Let G = (N, T, P, S) be a CFG without rules, unit-rules and useless
symbols and also
/ L(G). Modify G to G0 such that
G0 = (N 0 , T, P 0 , S) is in WCNF. Let A P . If || = 2, such
rules need not be modified. If || 3, the modification is as below:
If A = A1 A2 A3 , the new set of equivalent rules will be:
A A 1 B1
B1 A 2 A 3 .
Similarly if A A1 A2 . . . An P , it is replaced by
contd.
A A 1 B1
B1 A 2 B2
..
.
Bn2 An1 An .
Let P 00 be the collection of modified rules and
G00 = (N 00 , T, P 00 , S) be the modified grammar which
is clearly in CNF. Also L(G) = L(G00 ).
contd.
Example Let G = (N, T, P, S) be a CFG where
N = {S, A, B}, T = {a, b}
P = {S SAB|AB|SBC, A AB|a,
B BAB|b, C b}.
Clearly G is not in CNF but in WCNF. Hence the
modification of rules in P are as below:
For S SAB, the equivalent rules are
S SB1 , B1 AB.
For S SBC, the equivalent rules are
S SB2 , B2 BC.
For B BAB, the equivalent rules are
B BB3 , B3 AB.
Introduction to Formal Languages, Automata and Computability p.68/74
contd.
Hence G00 = (N 00 , T, P 00 , S) will be with
N 00 = {S, A, B, C, B1 , B2 , B3 }, T = {a, b}
P 00 ={S SB1 |SB2 |SA, B1 AB, B2 BC, B3 AB,
A AB|a, B BB3 |b, C b}.
Clearly G00 is in CNF.
contd.
Theorem For every CFG G = (N, T, P, S) there
exists an equivalent CFG in SCNF.
Let G = (N, T, P, S) be a CFG in CNF. One can
construct an equivalent CFG.
G0 = (N 0 , T, P 0 , S 0 ) in SCNF as below.
N 0 = {S 0 } {AL , AR |A N }
T =T
contd.
P 0 ={AL BL CR , AR BL CR |A BC P }
{S 0 XL YR |S XY P }
{S 0 a|S a P }
{AL a, AR a|A a P, a T }.
Clearly L(G) = L(G0 ) and G0 is in SCNF.
contd.
Example Let G = (N, T, P, S) be a CFG where
N = {S, A, B}, T = {0, 1} and
P = {S AB|0, B BA|1, A AB|0}.
Then G0 = (N 0 , T, P 0 , S 0 ) in SCNF will be with
N 0 = {S 0 , SL , SR , AL , AR , BL , BR }
T = {0, 1}
P 0 = {S 0 AL BR |0, SL AL BR |0,
SR AL BR |0, AL AL BR |0,
AR AL BR |0, BL BL AR |1,
BR BL AR |1}.
Introduction to Formal Languages, Automata and Computability p.73/74