Motivation For Formal Grammars

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 15

motivation for formal grammars

Natural Languages are usually described


by rules like the one shown below
Example:
(1) sentence  noun-phrase verb-phrase
(2) noun-phrase  article noun
(3) article  a | the
(4) noun  girl | dog
(5) verb-phrase  verb noun-phrase
(6) verb  sees | pets
Grammars Produce Languages

 Language: the set of strings (of terminals) that


can be generated from the start symbol by derivation:
sentence
 noun-phrase verb-phrase . (rule 1)
 article noun verb-phrase . (rule 2)
 the noun verb-phrase . (rule 3)
 the girl verb-phrase . (rule 4)
 the girl verb noun-phrase . (rule 5)
 the girl sees noun-phrase . (rule 6)
 the girl sees article noun . (rule 2)
 the girl sees a noun . (rule 3)
 the girl sees a dog . (rule 4)
•Language: the programs (character streams)
allowed

•Grammar rules (productions): "produce" the


language
left-hand side, right-hand side
•nonterminals (structured names):
noun-phrase verb-phrase
•terminals (tokens): . dog
•metasymbols:  (“consists of”) | (choice)
•start symbol: the nonterminal that stands for
the entire structure (sentence, program).
–sentence
•E.g., if-statement  if (expression) statement else
statement
Context-Free Grammar

 Context-Free Grammars (CFG)


 Noam Chomsky, 1950s.
 Define context-free languages.
 Four components:
 terminals, nonterminals, one start symbol,
productions (left-hand side: one single
nonterminal)
CFG ’s

Context-free grammar( CFG or just grammar) is


4-tuple denoted G=(V,T,P,S) ,

Where
V is a finite set of variables or nonterminals or syntactic
categories

T is a finite set of terminals or tokens

P is a finite set of production rules in the form


A   ,where A is a variable and  is a string of symbols from
(V T)*

S is a special variable called the start symbol


What does “Context-Free” mean?
 Left-hand side of a production is always
one single nonterminal:
 The nonterminal is replaced by the
corresponding right-hand side, no matter
where the nonterminal appears. (i.e., there
is no context in such
replacement/derivation.)
 Context-sensitive grammar (context-
sensitive languages)
 Why context-free?
CFG Example 1
 E → ID

| NUM

| E*E

| E/E

| E+E

| E-E

| (E)

ID → a | b |…|z

NUM → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Grammars Produce Languages
Does the above grammar produce the following
sentence (a*b)+c Start with start symbol of the CFG

E → E+E
Left most derivation
→ ( E)+E

→ ( E*E)+E
Notations
→ (ID*E)+E
G
→ ( a * E)+E E ==> E +E (single step left most derivation)
lm
→ ( a * ID)+E
*
E ==> E+E (zero or more step derivation)
→ ( a * b)+ID lm
*
Example: E ==> (a*b)+c
→ ( a * b) + c G
Context – free languages (CFL ’s)
The languages described by context –free grammars are
known as CFL ’s

formal notation:
The language generated by G [denoted L(G)] is {w | w is
== w }
*
in T* and S ==>
G
That is , a string is in L(G) if:
1) the string consists solely of terminals
2) the string can be derived from S
A string of terminals and variables is called
a sentential form if S ==>
* 
==
CFG Example 2
 S → if E then S else S
| begin S L
| print E
L → end
|;SL
E → NUM = NUM
NUM → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Parse Tree
 Represents the derivation steps from start
symbol to the string

 Given the derivations used in the parsing of an


input sequence, a parse tree has

 the start symbol as the root

 the terminals of the input sequence as leafs

 for each production A → X1 X2 ... Xn used in a


derivation, a node A with children X1 X2 ... Xn
Parse Tree Example 1
CFG:
expr
expr → expr + expr | expr * expr | (expr) | number
number → number digit |digit
digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
expr expr
+

Input Sequence: 3+4*5 number expr * expr

digit number number

3 digit digit

4 5
What is Parsing?

 Given a grammar and a token string:


 determine if the grammar can generate the
token string?
 i.e., is the string a legal program in the
language?

 In other words, to construct a parse


tree for the token string.
What’s significant about
parse tree?

 A parse tree gives a unique syntactic


structure

 Leftmost, rightmost derivation

 There is only one leftmost derivation for a


parse tree, and symmetrically only one
rightmost derivation for a parse tree.
Example
expr  expr + expr | expr  expr | ( expr ) | number
number  number digit | digit
digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

parse tree expr

leftmost derivation expr


expr + expr

expr + expr
number expr * expr

number expr * expr


digit number number

digit number number


3 digit digit
3 digit digit
4 5

4 5

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy