0% found this document useful (0 votes)
3 views8 pages

Grammar

The document discusses formal grammars, defining their components and classes, including regular, context-free, context-sensitive, and unrestricted grammars. It explains derivations, parse trees, and the concept of ambiguity in context-free grammars, providing examples such as the Dyck language and palindromes. Additionally, it outlines methods for constructing regular grammars from DFAs and NFAs, emphasizing the relationship between languages and their corresponding grammars.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views8 pages

Grammar

The document discusses formal grammars, defining their components and classes, including regular, context-free, context-sensitive, and unrestricted grammars. It explains derivations, parse trees, and the concept of ambiguity in context-free grammars, providing examples such as the Dyck language and palindromes. Additionally, it outlines methods for constructing regular grammars from DFAs and NFAs, emphasizing the relationship between languages and their corresponding grammars.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Grammars

Introduction

When I took Latin, starting in 1954, I felt intuitively that the grammar of the language was somehow
mathematical. But the highest math I knew was algebra, and I could not come up with any mathematical
formulation of Latin grammar.

In 1957, I finally read something about the theory of grammars. Noam Chomsky was one of the pioneers
of this field. Chomsky was trying to analyze natural languages, a (possibly) hopeless task. Fortunately,
we do not look at natural languages in this course. Instead, we deal exclusively with formal languages and
formal grammars. Henceforth, grammar means formal grammar.

Each grammar generates a language. A grammar G can be represented by a string, which we call hGi. A
grammar has a finite description, but might generate an infinite language.

There are classes of grammers, each of which generates a class of lenguages. Not all languages are generated
by grammars, but many important languages are.

Definition of a Grammar

A grammar G consists of the following parts.


1. Terminal alphabet Σ.
2. Variable alphabet V , The two alphabets may not have any symbol in common. Γ = Σ + V is called
the alphabet of grammar symbols.
3. S ∈ V , the start symbol.1
4. Finite set of productions. Each production is of the form lhs→rhs, where lhs and rhs are strings of
grammar symbols, called the left-hand side and the right-hand-side of the production.

Classes of Grammars

Chomsky defines five classes of grammars.


1. Left-regular grammars. This class generates the class of regular languages.
2. Right-regular grammars. This class also generates the class of regular languages.
3. Context-free grammars. This class generates the class of context-free languages.
4. Context-sensitive grammars. This class generates the class of context-sensitive languages.
5. Unrestricted grammars. This class generates the class of recursively enumerable languages.

These classes are defined by properties of their right-hand sides and left-hand sides. For each class, there
must be at least one production whose lhs is the start symbol, which, by convention, is typically S.

1. For each production of a left-linear grammar, the lhs must be a single member of V . The right-hand-side
must be one of the following:
(a) A terminal
1 There is no rule that the start symbol be called “S.” It could have any name.

1
(b) A terminal followed by a variable
(c) A variable
(d) The empty string

2. For each production of a right-linear grammar, the lhs must be a single member of V . The right-hand-
side must be one of the following:
(a) A terminal
(b) A variable followed by a terminal
(c) A variable
(d) The empty string

Left and right linear grammars are called regular grammars. Note: the (b) rules cannot be mixed. For
example, a regular grammer could not have both the productions A → aA and A → Ab.

Our definitions of linear grammars differs from the definitions given by Definition 3.3 in our textbook. In
that definition, only (a) and (b) are given. We have three justifications for this change.
(i) If we do not have (d), we cannot generate the empty string. But the empty string is a member of
many regular languages.
(ii) If we do not have (c), we have to go through unnecessary contortions to define the grammar
generates the language accepted by an NFA which has a λ-transition.
(iv) With (a), (b), (c), and (d), it is easier to understand the construction or regular grammars
equivalent to finite automata.
(iii) Allowing (c) and (d) does no harm: we still get only regular languages.

3. For each production of a context-free grammar, the lhs must be a single member of Γ. The right-hand-
side can be any string of grammar symbols.

4. For each production of a context-sensitive grammar, the lhs and rhs must be non-empty strings of
grammar symbols, and the length of the rhs must be at least as great the the length of the rhs2

For each production of an unrestricted grammar, the lhs must be a non-empty string of grammar symbols,
while the rhs may be any string of grammar symbols.

Derivations

Let G be a grammar. A G-derivation of a string w ∈ Σ∗ is a sequence of strings over Γ connected by


the symbol “⇒,” which is read as the word “derives.” These strings are called sentential forms. In any
derivation of w, the first sentential form is simply the start symbol, and the last is w itself.

Each step of a derivation makes use of just one production. The lhs of that production is replaced by the
rhs of that same production. That is, if u and v are consecutive sentential forms, i.e., u ⇒ v, there must
be a production α → β such that α is replaced by β at that step. That is, there are strings x, y ∈ Γ∗ such
that
1. u = xαy
2 This rule does not permit a context-sensitive language to contain the empty string. However, we usually want to allow

the empty string. We can achieve that by permitting the production S → λ, as long as S is not on the rhs of any production.

2
2. v = xβy

If productions are labeled, we sometimes place the label of the production above the “derives” symbol ⇒
for clarity.

The language generated by G, called L(G), is defined to be the language of all w ∈ Σ∗ which can be derived
from the start symbol.

Example. Let L = {an bm : n, m ≥ 0}, which is described by the regular expression a∗ b∗ . Then L is
generated by the regular grammar:
1. S → aS
2. S → B
3. B → bB
4. B → λ
We now give a G-derivation of w = aabbb.
1 1 2 3 3 3 4
S ⇒ aS ⇒ aaS ⇒ aaB ⇒ aabB ⇒ aabbB ⇒ aabbbB ⇒ aabbb

The above derivation proves that w is generated by G, that is, w ∈ L(G).

Computing a Regular Grammar from a DFA

Given any DFA M , there is a straightforward way to find a regular grammar which generates the language
accepted by M . Suppose Σ = {a1 , . . . an } and Q = {q0 , q1 , . . . qm }.3

We use δ to define a left-linear grammar G. We let Σ be the terminal alphabet of G. We let V =


{A0 , A1 , . . . Am } be the alphabet of variables. We let A0 be the start state of G. Each arrow in the state
diagram defines a production of the grammar, and each final state also defines a production.
If δ(qi , aj ) = qk , Ai → aj Ak is a production.
If qi is a final state, Ai → λ is a production.

Let M be the DFA illustrated in Figure 1.


a b

b
0 1
a
a b
2

Figure 1
3 Again, to avoid clutter, we write merely “i” to denote “qi ” in state diagrams.

3
M has one final state and its state diagram has SIX arrows, thus G has seven productions:
1.A0 → aA0
2.A0 → bA1
3.A1 → bA1
4.A1 → aA2
5.A2 → aA0
6.A2 → bA1
7.A2 → λ
Here is a G-derivation of abbaba:
1 2 3 4 6 4 7
A0 ⇒ aA0 ⇒ abA1 ⇒ abbA1 ⇒ abbaA2 ⇒ abbabA1 ⇒ abbabaA2 ⇒ abbaba

Computing a Regular Grammar from an NFA

Similarly, given an NFA M which accepts a language L, there is we can find a left-linear grammar which
generates L. Again, let Σ = {a1 , . . . an } and Q = {q0 , q1 , . . . qm },

As in the case of a DFA, we let Σ be the terminal alphabet of G, and V = {A0 , A1 , . . . Am } the alphabet
of variables, and A0 the start state of G. As in the case of a DFA, each final state and each arrow in the
state diagram defines a production.
If qk ∈ δ(qi , aj ), Ai → aj Ak is a production.
If qk ∈ δ(λ, aj ), Ai → Ak is a production.
If qi is a final state, Ai → λ is a production.

Let M be the NFA illustrated in Figure 2.

a
a
0 1
b λ b
2

Figure 2

M has one final state and its state diagram has five arrows, thus G has six productions:
1.A0 → aA0
2.A0 → aA1
3.A1 → bA2
4.A2 → bA0
5.A2 → A1
6.A2 → λ
Here is a G-derivation of aabbbab:
1 2 3 5 3 4 2 3 6
A0 ⇒ aA0 ⇒ aaA1 ⇒ aabA2 ⇒ aabA1 ⇒ aabbA2 ⇒ aabbbA0 ⇒ aabbbaA1 ⇒ aabbbabA2 ⇒ aabbbab

4
The above derivation proves that aabbbab is generated by G, that is, aabbbab ∈ L(G).

By this constructiohn, we have the following theorem.

Theorem 1 If a language L is accepted by an NFA with n states, then L is generated by a left-linear


grammar with n variables.

Proof: We just use the above construction. The formal proof that it yields a left-linear grammar for the
same language is more detailed, but not too hard to understand.

Exercise 1 Give a regular grammar for the language accepted by M .


a,c a,b a,b a,b
0 1 2 3 4

a,b

Figure 3: NFA M for Exercise 1

Ans: M has five states, and the minimal DFA which accepts L(M ) has 25 = 32 states. The following
regular grammar generates L(M ), where A0 is the start state.

1. A0 → aA0
2. A0 → bA0
3. A0 → aA1
4. A0 → bA1
5. A1 → aA2
6. A1 → bA2
7. A2 → aA3
8. A2 → bA3
9. A3 → aA4
10. A3 → bA4
11. A4 → λ

Context-Free Grammars

A grammar G is context-free if, for every production, the left-hand side is one variable. We write CFG to
mean context-free grammar. The right hand side of production of a CFG can be any string of grammar
symbols. The class of context-free grammars is, arguably, the most important class of grammars we study.

A language L is called context-free if it is generated by some context-free grammar. Two grammars are said
to be equivalent if they generate the same language. Every context-free language is generated by infinitely
many different equivalent context-free grammars. We write CFL to mean context-free language.

Remark 1 Every regular language is a context-free language.

Proof: Every regular grammar is a context-free grammar.

5
Simple Examples

Simplest Example. The grammar G, where V = {S} and Σ = {a, b}, the start symbol is S, and the
productions are:
1. S → aSb
2. S → λ
L(G) = {an bn : n ≥ 0}, arguably the simplest non-regular context-free language.

Here is a G-derivation (or just derivation if G is understood) of w = aabb.

S ⇒ aSb ⇒ aaSbb ⇒ aabb

Dyck Language. The Dyck language is the language of all balanced strings of left and right parentheses,
which is over the alphabet Σ = {(, )}. Here are three grammars for the Dyck language. In each case, S is
the start symbol and Γ = {S}.

G1
1. S → (S)
2. S → SS
3. S → λ

G2
1. S → S(S)
2. S → λ

G3
1. S → (S)S
2. S → λ

Palindromes. A palindrome is a word which is its own reverse, such as “level” or “noon.” Let L be the
language of all palindromes over the alphabet a, b. L is generated by the CFG
1. S → aSa
2. S → bSb
3. S → a
4. S → b
5. S → λ

Derivations and Parse Trees

If G is a CFG and L = L(G), then each w ∈ L has at least one derivation, frequently multiple derivations.
For example, let L = {an cn bm dm : n, m ≥ 0}. Then L is generated by a grammar with variables S, A, B
and start symbol S:
1. S → AB

6
2. A → aAc
3. A→λ
4. B → bBd
5. B→λ

Let w = aaccbbbddd. Here are two derivations of w.

S ⇒ AB ⇒ aAcB ⇒ aaAccB ⇒ aaccB ⇒ aaccbBd ⇒ aaccbbBdd ⇒ aaccbbbBddd ⇒ aaccbbbddd

S ⇒ AB ⇒ AbBd ⇒ AbbBcc ⇒ AbbbBccc ⇒ Abbbddd ⇒ aAcbbbddd ⇒ aaAccbbbddd ⇒ aaccbbbddd

The first of these is a left-most derivation, since at each step of the derivation, the left-most variable of the
sentential form is replaced by the rhs of a production. For example, in the second step, A is replaced by
aAc and B is not replaced. Similarly, the second derivation is a right-most derivation.

Parse Trees. For each derivation of a string S


w ∈ L(G), there is a parse tree of w. The internal
nodes of this tree are the variables in the deriva- A B
tion and each leaf is either a terminal or λ. The
two derivations of aaccbbbddd shown above give a A c b B d
rise to the same parse tree, shown to the right:
a A c b B d

λ b B d

Ambiguous and Unambiguous Grammars

A CFG G is called unambiguous if every string w ∈ L(G) has exactly one left-most derivation.

Theorem 2

(a) A CFG G is unambigous if and only if every string w ∈ L(G) has exactly one right-most derivation.

(b) A CFG G is unambigous if and only if every string w ∈ G(L) has exactly one parse tree.

A CFG is ambiguous if it is not unambiguous. A CFL is called inherently ambiguous if it has no unam-
biguous CFG.

The grammar G1 for the Dyck language is ambigous, while G2 and G3 are both unambiguous.

Dangling Else

Let G be the following context-free grammar, with start symbol S and terminals {a, i, e}
1. S → a
2. S → iS

7
3. S → iSeS

Exercise 2 Show that G is ambiguous by giving two parse trees for the string iiaea.

L(G) does have an unambigous grammar, but it is more complex than the ambiguous grammar.

G models the “dangling else” problem for programming languages. In the following C++ fragment, what
value will be output?

int x = 0;
int y = 0;
int z = 3;
if(x == 1)
if(y == 0)
z = 2;
else z = 4;
cout << z << endl;

What value will be output?

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy