0% found this document useful (1 vote)
906 views17 pages

Specification of Tokens

This document defines key concepts related to strings, languages, and regular expressions. It introduces strings as sequences of symbols from an alphabet. Languages are defined as sets of strings over a fixed alphabet. Regular expressions provide a notation for specifying patterns in strings using operations like concatenation, union, and Kleene closure. The document also outlines properties and limitations of regular expressions.

Uploaded by

SMARTELLIGENT
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
906 views17 pages

Specification of Tokens

This document defines key concepts related to strings, languages, and regular expressions. It introduces strings as sequences of symbols from an alphabet. Languages are defined as sets of strings over a fixed alphabet. Regular expressions provide a notation for specifying patterns in strings using operations like concatenation, union, and Kleene closure. The document also outlines properties and limitations of regular expressions.

Uploaded by

SMARTELLIGENT
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

SPECIFICATION OF

TOKENS

1
Strings and Languages
• Regular Expressions are an important notation for specifying patterns.

• Alphabet – any finite set of symbols


e.g. ASCII, binary alphabet, UNICODE, EBCDIC,LATIN-1

• String – A finite sequence of symbols drawn from an alphabet


– Banana (ASCII Alphabet)
– Length of a string => |s|
– Empty String => ε

• Other terms relating to strings: prefix; suffix; substring; proper prefix,


suffix, or substring (non-empty, not entire string); subsequence

• Language – A set of strings over a fixed alphabet

2
Languages
• A language, L, is simply any set of strings over a
fixed alphabet.

Alphabet Languages
{0,1} {0,10,100,1000,100000…}
{0,1,00,11,000,111,…}
{a,b,c} {abc,aabbcc,aaabbbccc,…}
{A, … ,Z} {FOR,WHILE,GOTO,…}
{A,…,Z,a,…,z,0,…9, { All legal PASCAL progs}
+,-,…,<,>,…}

Special Languages:  - EMPTY LANGUAGE


 - contains  string only

3
String operations
• Given String: banana
• Prefix : ban, banana
• Suffix : ana, banana
• Substring : nan, ban, ana, banana
• Subsequence: bnan, nn
• Proper Prefix and Suffix

4
String Operations
• Concatenation
– xy; s = s = s;  - identity for concatenation
– s0 =  if i > 0 si = si-1s

5
Operations on Languages

OPERATION DEFINITION
union of L and M L  M = {s | s is in L or s is in M}
written L  M
concatenation of L LM = {st | s is in L and t is in M}
and M written LM

Kleene closure of L L*= Li

written L*

i 0

L* denotes “zero or more concatenations of “ L


positive closure of L 
L i

written L+ L+= 
i 1

L+ denotes “one or more concatenations of “ L


Exponentiation Lo={ε}, L1=L,L2=LL
6
Operations on Languages
• LUD is the set of letters and digits
• LD is the set of strings consisting of a
letter followed by a digit
• L4 is the set of all four strings
• L* is the set of strings including ε
• D+ is the set of strings of one or more
digits.

7
Say What?
L = {A, B, C, D } D = {1, 2, 3}
• LD
{A, B, C, D, 1, 2, 3 }
• LD
{A1, A2, A3, B1, B2, B3, C1, C2, C3, D1, D2, D3 }
• L2
{ AA, AB, AC, AD, BA, BB, BC, BD, CA, … DD}
• L*
{ All possible strings of L plus  }
• L+
L* - 
• L (L  D )
Valid :{ A1,AA,B3,CD} Invlaid:{321,4A2}
• L (L  D )*
Valid:{ A,A1,A23,D3,DA3..} Invalid:{31}
8
Regular Expressions
• A Regular Expression is a Set of Rules /
Techniques for Constructing Sequences of
Symbols (Strings) from an Alphabet.

• Let  Be an Alphabet, r a Regular Expression


Then L(r) is the Language That is characterized
by the Rules of r

9
Regular Expressions
• Defined over an alphabet Σ

• ε represents {ε}, the set containing the empty string

• If a is a symbol in Σ, then a is a regular expression


denoting {a}, the set containing the string a

• If r and s are regular expressions denoting the


languages L(r) and L(s), then:
– (r)|(s) is a regular expression denoting L(r)U L(s)
– (r)(s) is a regular expression denoting L(r)L(s)
– (r)* is a regular expression denoting (L(r))*
– (r) is a regular expression denoting L(r)

• Precedence: * (left associative), then concatenation (left


associative), then | (left associative) 10
Regular Expressions
Alphabet = {a, b}
1. a|b denotes {a, b}
2. (a|b)(a|b) denotes {ab, aa, ba, bb}
3. a* denotes {, a, aa, …}
4. (a|b)* - Strings of a’s and b’s including the 
5. a|a*b – a followed by zero/more a’s followed by b

11
Algebraic Properties of Regular
Expressions

AXIOM DESCRIPTION
r|s=s|r | is commutative
r | (s | t) = (r | s) | t | is associative
(r s) t = r (s t) concatenation is associative
r(s|t)=rs|rt
(s|t)r=sr|tr concatenation distributes over |

r = r
r = r  Is the identity element for concatenation

r* = ( r |  )* relation between * and 


r** = r* * is idempotent

12
Regular Definitions
• Names maybe given to regular expressions; these
names can be used like symbols
• Let  is an alphabet of basic symbols. The regular
definition is a sequence of definitions of the form
d1 r1
d2 r2
...
dn rn
Where, each di is a distinct name, and each ri is a
regular expression over the symbols in   {d1, d2,
…, di-1 }

13
Regular Definitions
• Example 1:
– letter  A|B|…|Z|a|b|…|z
– digit  0|1|…|9
– id  letter (letter | digit)*
• Example 2
– digit  0 | 1 | 2 | … | 9
– digits  digit digit*
– optional_fraction  . digits | 
– optional_exponent  ( E ( + | -| ) digits) | 
– num  digits optional_fraction optional_exponent

14
Regular Definitions
• Shorthand
– One or more instances: r+ denotes rr*
– Zero or one Instance: r? denotes r|ε
– Character classes: [a-z] denotes [a|b|…|
z]

15
Example
• digit  0 | 1 | 2 | … | 9
• digits  digit+
• optional_fraction  (. digits ) ?
• optional_exponent  ( E ( + | -) ? digits) ?
• num  digits optional_fraction optional_exponent

16
Limitations of Regular
Expression
• Some languages cannot be described by any regular
expression
• Cannot describe balanced or nested constructs
– Example, all valid strings of balanced parentheses
– This can be done with CFG
• Cannot describe repeated strings
– Example: {wcw|w is a string of a’s and b’s}
– This can be done with CFG
• Can be used to denote only a fixed or unspecified
number of repetitions.

17

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy