2 Regular Expression

University of sulaimani
College of science
Department of Computer Science
Computation
Regular Expression
Lecture four
Mzhda Hiwa Hama

2023-2024
Regular Expression
• Regular Expressions (shortened as "regex") are used in

many programming languages and tools. They can be
used in finding and extracting patterns in texts and
programs.
• Regular expressions are a way to search for substrings

("matches") in strings. This is done by searching with
"patterns" through the string.
• Regular expressions are useful tools in the design of

compilers for programming languages. Elemental objects
in a programming language, called tokens, such as the
variable names and constants, may be described with
• Using regular expressions, we can also specify and
validate forms of data such as passwords, e-mail
addresses, user IDs, etc.
Regular Expression’s
metacharacters
A bracket expression. Matches a single character that is

contained within the brackets. For example, [abc] matches "a",
[] "b", or "c". [a-z] specifies a range which matches any lowercase
letter from "a" to "z". These forms can be mixed: [abcx-z]
matches "a", "b", "c", "x", "y", or "z“.
. Matches any single character. Within bracket expressions,

the dot character matches a literal dot. For example, a.c
matches "abc", etc., but [a.c] matches only "a", ".", or "c".
Matches a single character that is not contained within the
brackets. For example, [âbc] matches any character other than
[^ ] "a", "b", or "c". [â-z] matches any single character that is not a
lowercase letter from "a" to "z".
metacharacters
() Defines a marked sub expression. The string matched within the

parentheses can be recalled later . A marked subexpression is also
called a block or capturing group. (abc)
* Matches the preceding element zero or more times. For example,

ab*c matches "ac", "abc", "abbbc", etc. [xyz]* matches "", "x", "y",
"z", "zx", "zyx", "xyzzy", and so on. (ab)* matches "", "ab", "abab",
"ababab", and so on.
? Matches the preceding element zero or one time. For example, ab?c
matches only "ac" or "abc".
Matches the preceding element one or more times. For example,
+ ab+c matches "abc", "abbc", "abbbc", and so on, but not "ac".
The choice (also known as alternation or set union) operator
matches either the expression before or the expression after the
| operator. For example, abc|def matches "abc" or "def".
metacharacters
^ Matches the starting position within the string.
$ Matches the ending position of the string or the position just

before a string-ending newline.
{n} Matches Exactly the specified number of occurrences,

a{3} contains {aaa} , exactly three a
{m,n} Matches the preceding element at least m and not more than n
times. For example, a{3,5} matches only "aaa", "aaaa", and
"aaaaa".
Formal Definition of Regular
Expression
Say that R is a regular expression if R is
1. a for some a in the alphabet Σ, so a represented as {a}

2. ε, represent {ε} language.
3. ∅, represent { } empty language.
Note: Don’t confuse the regular expressions ε and ∅. The

expression ε represents the language containing a single string
—namely, the empty string—whereas ∅ represents the
language that doesn’t contain any strings.
Regular Expression Operation
1. Union(OR) : where R1 and R2 are regular expressions, then

(R1 ∪ R2), also written as( R1 | R2 or R1 + R2) is also a
regular expression. L(R1|R2) = L(R1) U L(R2).
2. Concatenation: (R1 ◦ R2), where R1 and R2 are regular

expressions then R1R2 (also written as R1.R2) is also a
regular expression. L(R1R2) = L(R1) concatenated with
L(R2).
3. Kleene closure(star): (R1*), where R1 is a regular

expression then R1* (the Kleene closure of R1) is also a
regular expression. L(R1*) = epsilon U L(R1) U L(R1R1) U
L(R1R1R1) U…
Regular Expression and
languages
• The origins of regular expressions lie in Automata
Theory and Formal Language Theory.
• We can use RE to identify Regular Languages.
• So, The value of regular expression is a language.
• Regular language is one accepted by some FA or

described by an RE.
Note
• In arithmetic, we can use the operations + and × to build up

expressions such as (5 + 3) × 4 . Similarly, we can use the
regular operations to build up expressions describing
languages, which are called regular expressions. An example
is: (0 ∪ 1)0 ∗ . The value of the arithmetic expression is the
number 32. The value of a regular expression is a language.
In arithmetic, we say that × has precedence over + to mean

that when there is a choice, we do the × operation first. Thus
in 2+3×4, the 3×4 is done before the addition. To have the
addition done first, we must add parentheses to obtain (2 +
3)×4. In regular expressions, the star operation is done first,
followed by concatenation, and finally union, unless
parentheses change the usual order.
Examples
• In the following instances, we assume that the alphabet

Σ is{0,1}.
1. 0*10* = {w|w contains a single 1}.
2. Σ*1Σ* ={w|w has at least one 1}. Σ*=(0+1) *
3. Σ*001Σ* ={w|w contains the string 001 as a substring}.
4. (ΣΣ)* = {w|w is a string of even length}.
5. (ΣΣΣ)* = {w|the length of w is a multiple of 3}.
6. 01∪10 = {01,10}.
7. (0∪ε)(1∪ε) = {ε,0,1,01}.
8. 1*∅= ∅. Concatenating the empty set to any set
yields the empty set.
9. (0 ᴜ 1 )* Consists of all possible strings of 0s and 1s
10. (0∑*) ᴜ (∑*1) Consists of all strings that start with

0 or end with 1.
11. The set of strings over {0,1} that end in 3 consecutive

1's.
(0 | 1)* 111
12. The set of strings over {0,1} that have at most one 1
0* | 0* 1 0*
Homework
• Write a regular expressions for each of the

following languages:
1. {w| w starts with a 0 or a 1 and followed by any

number of 0s}
2. {w| w contains the string 101 as a substring}
3. {w| w starts with the string 11 and ends with
10}
4. Start and end with same symbol.
5. {w| w contains at least three 1s}
Equivalence with Finite
Automata
• Every regular language is FA recognizable, ie. Any RE
can be converted into Finite Automata that
recognizes the language it describes, and vice versa.
Recall that a regular language is one that is
recognized by some finite automaton.
• Note: A language is regular if and only if some

regular expression describes it .
Example1
• We convert the regular expression (ab∪a)* to an NFA in a

sequence of stages. We build up from the smallest
subexpressions to larger subexpressions until we have an
NFA for the original expression, as shown in the following
diagram.
Example 2
• (a ᴜ b)* aba
Look Ahead and Look Behind
collectively called "lookaround"
You can have assertions in your pattern like lookahead or

behind to ensure that a substring does or does not occur.
These “look around” assertions are specified by putting
the substring checked for in a string, whose leading
characters are:
• ?= (for positive lookahead),

• ?! (negative lookahead),
• ?<= (positive lookbehind),
• ?<! (negative lookbehind).
Look Ahead and Look Behind…
cont’d
• Use ?! (for negative lookahead), if the query was to
avoid appearing a specific substring in a string. At
the beginning of the string
• Ex: ^(?!101)[01]* // Doesn’t have 101 at beginning

of the string.
cont’d
• Use ?= (for positive lookahead), if the query

required appearing a specific substring in a string.
At the beginning of the string
Ex: ^(?=101)[01]* // String must contain 101 at

beginning of the string.
cont’d
• Use ?<! (for negative lookbehind), if the query was to
avoid appearing a specific substring only at the end of
the string
Ex: ^[01]*(?<!101)$ // Doesn’t end with 101
• Use ?<= (for positive lookbehind), if the query required

appearing a specific substring only at the end of the
string
Ex: ^[01]*(?<=101)$ // must end with 101

• Note: always specify the end position with $ when using
lookbehind.

2 Regular Expression

Uploaded by

Copyright:

Available Formats

2 Regular Expression

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2 Regular Expression

Uploaded by

Copyright:

Available Formats

University of sulaimani

Mzhda Hiwa Hama

• Regular Expressions (shortened as "regex") are used in

• Regular expressions are a way to search for substrings

• Regular expressions are useful tools in the design of

A bracket expression. Matches a single character that is

. Matches any single character. Within bracket expressions,

() Defines a marked sub expression. The string matched within the

* Matches the preceding element zero or more times. For example,

^ Matches the starting position within the string.

$ Matches the ending position of the string or the position just

{n} Matches Exactly the specified number of occurrences,

1. a for some a in the alphabet Σ, so a represented as {a}

Note: Don’t confuse the regular expressions ε and ∅. The

1. Union(OR) : where R1 and R2 are regular expressions, then

2. Concatenation: (R1 ◦ R2), where R1 and R2 are regular

3. Kleene closure(star): (R1*), where R1 is a regular

• We can use RE to identify Regular Languages.

• So, The value of regular expression is a language.

• Regular language is one accepted by some FA or

• In arithmetic, we can use the operations + and × to build up

In arithmetic, we say that × has precedence over + to mean

• In the following instances, we assume that the alphabet

10. (0∑*) ᴜ (∑*1) Consists of all strings that start with

11. The set of strings over {0,1} that end in 3 consecutive

• Write a regular expressions for each of the

1. {w| w starts with a 0 or a 1 and followed by any

• Note: A language is regular if and only if some

• We convert the regular expression (ab∪a)* to an NFA in a

You can have assertions in your pattern like lookahead or

• ?= (for positive lookahead),

• Ex: ^(?!101)[01]* // Doesn’t have 101 at beginning

• Use ?= (for positive lookahead), if the query

Ex: ^(?=101)[01]* // String must contain 101 at

• Use ?<= (for positive lookbehind), if the query required

Ex: ^[01]*(?<=101)$ // must end with 101

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

10. (0∑) ᴜ (∑1) Consists of all strings that start with