2 Regular Expression

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 23

University of sulaimani

College of science
Department of Computer Science

Computation
Regular Expression
Lecture four

Mzhda Hiwa Hama


2023-2024
Regular Expression

• Regular Expressions (shortened as "regex") are used in


many programming languages and tools. They can be
used in finding and extracting patterns in texts and
programs.

• Regular expressions are a way to search for substrings


("matches") in strings. This is done by searching with
"patterns" through the string.

• Regular expressions are useful tools in the design of


compilers for programming languages. Elemental objects
in a programming language, called tokens, such as the
variable names and constants, may be described with
• Using regular expressions, we can also specify and
validate forms of data such as passwords, e-mail
addresses, user IDs, etc.
Regular Expression’s
metacharacters

A bracket expression. Matches a single character that is


contained within the brackets. For example, [abc] matches "a",
[] "b", or "c". [a-z] specifies a range which matches any lowercase
letter from "a" to "z". These forms can be mixed: [abcx-z]
matches "a", "b", "c", "x", "y", or "z“.

. Matches any single character. Within bracket expressions,


the dot character matches a literal dot. For example, a.c
matches "abc", etc., but [a.c] matches only "a", ".", or "c".
Matches a single character that is not contained within the
brackets. For example, [^abc] matches any character other than
[^ ] "a", "b", or "c". [^a-z] matches any single character that is not a
lowercase letter from "a" to "z".
Regular Expression’s
metacharacters

() Defines a marked sub expression. The string matched within the


parentheses can be recalled later . A marked subexpression is also
called a block or capturing group. (abc)

* Matches the preceding element zero or more times. For example,


ab*c matches "ac", "abc", "abbbc", etc. [xyz]* matches "", "x", "y",
"z", "zx", "zyx", "xyzzy", and so on. (ab)* matches "", "ab", "abab",
"ababab", and so on.
? Matches the preceding element zero or one time. For example, ab?c
matches only "ac" or "abc".
Matches the preceding element one or more times. For example,
+ ab+c matches "abc", "abbc", "abbbc", and so on, but not "ac".
The choice (also known as alternation or set union) operator
matches either the expression before or the expression after the
| operator. For example, abc|def matches "abc" or "def".
Regular Expression’s
metacharacters

^ Matches the starting position within the string.

$ Matches the ending position of the string or the position just


before a string-ending newline.

{n} Matches Exactly the specified number of occurrences,


a{3} contains {aaa} , exactly three a

{m,n} Matches the preceding element at least m and not more than n
times. For example, a{3,5} matches only "aaa", "aaaa", and
"aaaaa".
Formal Definition of Regular
Expression
Say that R is a regular expression if R is

1. a for some a in the alphabet Σ, so a represented as {a}


2. ε, represent {ε} language.
3. ∅, represent { } empty language.

Note: Don’t confuse the regular expressions ε and ∅. The


expression ε represents the language containing a single string
—namely, the empty string—whereas ∅ represents the
language that doesn’t contain any strings.
Regular Expression Operation

1. Union(OR) : where R1 and R2 are regular expressions, then


(R1 ∪ R2), also written as( R1 | R2 or R1 + R2) is also a
regular expression. L(R1|R2) = L(R1) U L(R2).

2. Concatenation: (R1 ◦ R2), where R1 and R2 are regular


expressions then R1R2 (also written as R1.R2) is also a
regular expression. L(R1R2) = L(R1) concatenated with
L(R2).

3. Kleene closure(star): (R1*), where R1 is a regular


expression then R1* (the Kleene closure of R1) is also a
regular expression. L(R1*) = epsilon U L(R1) U L(R1R1) U
L(R1R1R1) U…
Regular Expression and
languages
• The origins of regular expressions lie in Automata
Theory and Formal Language Theory.

• We can use RE to identify Regular Languages.

• So, The value of regular expression is a language.

• Regular language is one accepted by some FA or


described by an RE.
Note

• In arithmetic, we can use the operations + and × to build up


expressions such as (5 + 3) × 4 . Similarly, we can use the
regular operations to build up expressions describing
languages, which are called regular expressions. An example
is: (0 ∪ 1)0 ∗ . The value of the arithmetic expression is the
number 32. The value of a regular expression is a language.

In arithmetic, we say that × has precedence over + to mean


that when there is a choice, we do the × operation first. Thus
in 2+3×4, the 3×4 is done before the addition. To have the
addition done first, we must add parentheses to obtain (2 +
3)×4. In regular expressions, the star operation is done first,
followed by concatenation, and finally union, unless
parentheses change the usual order.
Examples

• In the following instances, we assume that the alphabet


Σ is{0,1}.
1. 0*10* = {w|w contains a single 1}.
2. Σ*1Σ* ={w|w has at least one 1}. Σ*=(0+1) *
3. Σ*001Σ* ={w|w contains the string 001 as a substring}.
4. (ΣΣ)* = {w|w is a string of even length}.
5. (ΣΣΣ)* = {w|the length of w is a multiple of 3}.
6. 01∪10 = {01,10}.
7. (0∪ε)(1∪ε) = {ε,0,1,01}.
8. 1*∅= ∅. Concatenating the empty set to any set
yields the empty set.
9. (0 ᴜ 1 )* Consists of all possible strings of 0s and 1s

10. (0∑*) ᴜ (∑*1) Consists of all strings that start with


0 or end with 1.

11. The set of strings over {0,1} that end in 3 consecutive


1's.
(0 | 1)* 111

12. The set of strings over {0,1} that have at most one 1
0* | 0* 1 0*
Homework

• Write a regular expressions for each of the


following languages:

1. {w| w starts with a 0 or a 1 and followed by any


number of 0s}
2. {w| w contains the string 101 as a substring}
3. {w| w starts with the string 11 and ends with
10}
4. Start and end with same symbol.
5. {w| w contains at least three 1s}
Equivalence with Finite
Automata
• Every regular language is FA recognizable, ie. Any RE
can be converted into Finite Automata that
recognizes the language it describes, and vice versa.
Recall that a regular language is one that is
recognized by some finite automaton.

• Note: A language is regular if and only if some


regular expression describes it .
Example1

• We convert the regular expression (ab∪a)* to an NFA in a


sequence of stages. We build up from the smallest
subexpressions to larger subexpressions until we have an
NFA for the original expression, as shown in the following
diagram.
Example 2

• (a ᴜ b)* aba
Look Ahead and Look Behind
collectively called "lookaround"

You can have assertions in your pattern like lookahead or


behind to ensure that a substring does or does not occur.
These “look around” assertions are specified by putting
the substring checked for in a string, whose leading
characters are:

• ?= (for positive lookahead),


• ?! (negative lookahead),
• ?<= (positive lookbehind),
• ?<! (negative lookbehind).
Look Ahead and Look Behind…
cont’d
• Use ?! (for negative lookahead), if the query was to
avoid appearing a specific substring in a string. At
the beginning of the string

• Ex: ^(?!101)[01]* // Doesn’t have 101 at beginning


of the string.
Look Ahead and Look Behind…
cont’d

• Use ?= (for positive lookahead), if the query


required appearing a specific substring in a string.
At the beginning of the string

Ex: ^(?=101)[01]* // String must contain 101 at


beginning of the string.
Look Ahead and Look Behind…
cont’d
• Use ?<! (for negative lookbehind), if the query was to
avoid appearing a specific substring only at the end of
the string
Ex: ^[01]*(?<!101)$ // Doesn’t end with 101

• Use ?<= (for positive lookbehind), if the query required


appearing a specific substring only at the end of the
string

Ex: ^[01]*(?<=101)$ // must end with 101


• Note: always specify the end position with $ when using
lookbehind.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy