0% found this document useful (0 votes)
4 views26 pages

Re Expression 19 and 20

The document provides an overview of Python's regular expressions (RegEx) using the 're' module, detailing its functions such as findall, search, split, and sub. It explains how to use these functions with examples, including pattern matching and the use of metacharacters and special sequences. Additionally, it covers the Match Object, which contains information about the search results.

Uploaded by

rdeykabiraj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views26 pages

Re Expression 19 and 20

The document provides an overview of Python's regular expressions (RegEx) using the 're' module, detailing its functions such as findall, search, split, and sub. It explains how to use these functions with examples, including pattern matching and the use of metacharacters and special sequences. Additionally, it covers the Match Object, which contains information about the search results.

Uploaded by

rdeykabiraj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Session 19,20,21 ,22,23,25

Python RegEx
A Regular Expression or RegEx is a special sequence of characters that uses a search
pattern to find a string or set of strings.
It can detect the presence or absence of a text by matching it with a particular pattern and
also can split a pattern into one or more sub-patterns.
Regex Module in Python
Python has a built-in module named “re” that is used for regular expressions in Python. We
can import this module by using the import statement.
Example: Importing re module in Python
# importing re module
import re
#Check if the string starts with "The" and ends with "Spain":
txt = "The rain in India"
x = re.search("^The.*India$", txt)
if x:
print("YES! We have a match!")
else:
print("No match")
RegEx Functions
The re module offers a set of functions that allows us to search a
string for a match:
Function Description

findall Returns a list containing all matches


search Returns a Match object if there is a
match anywhere in the string
split Returns a list where the string has been
split at each match
sub Replaces one or many matches with a
string

The findall() Function


import re
#Return a list containing every occurrence of "ai":
txt = "The rain in India"

P a g e 1 | 26
Microsoft Certified Power BI Data Analyst
Arindam Ghosh,9433547743
SalesForce Certified Administrator
x = re.findall("ai", txt)
print(x)
output→[ai]
txt = "The rain in India"
x = re.findall("in", txt)
print(x)
output→ ['in', 'in']
import re
#Return a list containing every occurrence of "ai":
txt = "The rain in India"
x = re.findall("dxi", txt)
print(x)
output→[]
import re
p = re.compile('[a-e]')
// creates a pattern object that matches any single character in
the range of 'a' to 'e', inclusive.
print(p.findall("Aye, said Mr. R D Sharma"))
output → which are the matching characters. ['e', 'a', 'd', 'a', 'a']

import re
p = re.compile('\w')
print(p.findall("G@@d Morn!n_ ."))
Output →['G', 'd', 'M', 'o', 'r', 'n', 'n', '_']

We are creating a pattern object that matches any single word


character. In regular expression terms, a word character is
defined as any alphanumeric character (letters and digits) plus
the underscore (_).

import re
# Compile the pattern
P a g e 2 | 26
Microsoft Certified Power BI Data Analyst
Arindam Ghosh,9433547743
SalesForce Certified Administrator
p = re.compile(r'\w+')
# Test the pattern on a sample string
test_string = "Hello, World! 123 _test_"
matches = p.findall(test_string)

print(matches) # Output: ['Hello', 'World', '123', '_test_']

P a g e 3 | 26
Microsoft Certified Power BI Data Analyst
Arindam Ghosh,9433547743
SalesForce Certified Administrator
We are creating a pattern object that matches one or more word
characters consecutively. In regular expression terms, a word
character is defined as any alphanumeric character (letters and
digits) plus the underscore (_). The + quantifier means "one or
more" of the preceding element.

The search() Function


The search() function searches the string for a match, and returns
a Match object if there is a match.If there is more than one match, only
the first occurrence of the match will be returned:( The \s pattern
matches any whitespace character, including spaces, tabs, and newline
characters.)
import re
# Sample text
txt = "Hello World!"
# Search for the first occurrence of a whitespace character
x = re.search("\s", txt)

if x:
print("Whitespace character found at position:", x.start())
else:
print("No whitespace character found.")

Output→
The first white-space character is located in position: 5
import re
txt = "The rain in Spain"
x = re.search("Portugal", txt)
print(x)
Output→None

P a g e 4 | 26
Microsoft Certified Power BI Data Analyst
Arindam Ghosh,9433547743
SalesForce Certified Administrator
The split() Function
The split() function returns a list where the string has been split
at each match:
import re
#Split the string at every white-space character:
txt = "The rain in Spain"
x = re.split("\s", txt)
print(x)
output→ ['The', 'rain', 'in', 'Spain']
import re
#Split the string at the first white-space character:
txt = "The rain in Spain"
x = re.split("\s", txt, 1)
print(x)
Output→ ['The', 'rain in Spain']
from re import split
print(split('\W+', 'WORDS, words , Words'))
print(split('\W+', 'W@rds, wor!s , Wo&ds'))
print(split('\w+', "W@r$s"))
print(split('\W+', 'On 12th Jan 2016, at 11:02 AM'))
print(split('\d+', 'On 12th Jan 2016, at 11:02 AM'))
// # Split the string based on one or more digit characters
Output
WORDS, words , Words
['WORDS', 'words', 'Words']
W@rds, wor!s , Wo&ds
['W', 'rds', 'wor', 's', 'Wo', 'ds']
\w+ W@r$s
['', '@', '$', '']
On 12th Jan 2016, at 11:02 AM
['On', '12th', 'Jan', '2016', 'at', '11', '02', 'AM']
['On ', 'th Jan ', ', at ', ':', ' AM']

P a g e 5 | 26
Microsoft Certified Power BI Data Analyst
Arindam Ghosh,9433547743
SalesForce Certified Administrator
The sub() Function
The sub() function replaces the matches with the text of your
choice:
import re
#Replace all white-space characters with the digit "*":
txt = "The rain in Spain"
x = re.sub("\s", "*", txt)
print(x)
output→ The*rain*in*Spain
import re
#Replace all white-space characters with the digit "X":
txt = "The rain in Spain"
x = re.sub("\s", "X", txt, 2)
print(x)
Output→ TheXrainXin Spain
# Replace the first two whitespace characters with "X"

P a g e 6 | 26
Microsoft Certified Power BI Data Analyst
Arindam Ghosh,9433547743
SalesForce Certified Administrator
Match Object
A Match Object is an object containing information about the
search and the result.
Note: If there is no match, the value None will be returned,
instead of the Match Object.
import re
#The search() function returns a Match object:
txt = "The rain in Spain"
x = re.search("ai", txt)
print(x)
Output→ <re.Match object; span=(5, 7), match='ai'>
The Match object has properties and methods used to retrieve
information about the search, and the result:
.span() returns a tuple containing the start-, and end positions of the
match.
.string returns the string passed into the function
.group() returns the part of the string where there was a match

P a g e 7 | 26
Microsoft Certified Power BI Data Analyst
Arindam Ghosh,9433547743
SalesForce Certified Administrator
import re
#Search for an upper case "S" character in the beginning of a word,
and print its position:
import re

# Sample text
txt = "The rain in Spain stays mainly in the plain."

# Search for the first word that starts with 'S'


x = re.search(r"\bS\w+", txt)

if x:
print("Found word:", x.group())
print("Starting position:", x.start())
print("Starting position:", x.span())
print("Original string:", x.string)
else:
print("No matching word found.")

Output→
Found word: Spain
Starting position: 12
Starting position: (12, 17)
Original string: The rain in Spain stays mainly in the plain.

P a g e 8 | 26
Microsoft Certified Power BI Data Analyst
Arindam Ghosh,9433547743
SalesForce Certified Administrator
import re
regex = r"([a-zA-Z]+) (\d+)"
match = re.search(regex, "I was born on June 24")
if match != None:
print ("Match at index %s, %s" % (match.start(), match.end()))
print ("Full match: %s" % (match.group(0)))
print ("Month: %s" % (match.group(1)))
print ("Day: %s" % (match.group(2)))
else:
print ("The regex pattern does not match.")
output
Match at index 14, 21
Full match: June 24
Month: June
Day: 24

P a g e 9 | 26
Microsoft Certified Power BI Data Analyst
Arindam Ghosh,9433547743
SalesForce Certified Administrator
import re

# Sample text with a date


txt = "Today is 03/04/2025."

# Search for a pattern with groups for month and day


match = re.search(r"(\d{2})/(\d{2})/(\d{4})", txt)

if match:
print("Match at index %s, %s" % (match.start(), match.end()))
print("Full match: %s" % (match.group(0))) # Entire match
print("Month: %s" % (match.group(1))) # First group (month)
print("Day: %s" % (match.group(2))) # Second group (day)
print("Year: %s" % (match.group(3))) # Third group (year)
else:
print("No match found.")
Output
Match at index 9, 19
Full match: 03/04/2025
Month: 03
Day: 04
Year: 2025

P a g e 10 | 26
Microsoft Certified Power BI Data Analyst
Arindam Ghosh,9433547743
SalesForce Certified Administrator
P a g e 11 | 26
Microsoft Certified Power BI Data Analyst
Arindam Ghosh,9433547743
SalesForce Certified Administrator
re.escape()
Returns string with all non-alphanumerics backslashed, this is
useful if you want to match an arbitrary literal string that may have
regular expression metacharacters in it.
re.escape() is used to escape special characters in a string, making
it safe to be used as a pattern in regular expressions. It ensures that
any characters with special meanings in regular expressions are
treated as literal characters.
import re
print(re.escape("This is Awesome even 1 AM"))
print(re.escape("I Asked what is this [a-9], he said \t ^WoW"))
Output
This\ is\ Awesome\ even\ 1\ AM
I\ Asked\ what\ is\ this\ \[a\-9\]\,\ he\ said\ \ \ \^WoW

P a g e 12 | 26
Microsoft Certified Power BI Data Analyst
Arindam Ghosh,9433547743
SalesForce Certified Administrator
Meta-characters
Metacharacters are the characters with special meaning.
To understand the RE analogy, Metacharacters are useful and
important. They will be used in functions of module re. Below is the
list of metacharacters.
Meta Description
Characters
\ Used to drop the special meaning of character
following it

[] Represent a character class


^ Matches the beginning
$ Matches the end
. Matches any character except newline
| Means OR (Matches with any of the characters
separated by it.
? Matches zero or one occurrence
* Any number of occurrences (including 0
occurrences)
+ One or more occurrences
{} Indicate the number of occurrences of a
preceding regex to match.
() Enclose a group of Regex

P a g e 13 | 26
Microsoft Certified Power BI Data Analyst
Arindam Ghosh,9433547743
SalesForce Certified Administrator
Special Sequences
Special sequences do not match for the actual character in the
string instead it tells the specific location in the search string where
the match must occur. It makes it easier to write commonly used
patterns.
List of special sequences
Special Description Examples
Sequence
\A Matches if the string \Afor for seeks
begins with the for the world
given character
\b Matches if the word \bse seeks
begins or ends with set
the given character.
\b(string) will check
for the beginning of
the word and
(string)\b will check
for the ending of the
word.
\B It is the opposite of \Bge together
the \b i.e. the string forge
should not start or
end with the given
regex.
\d Matches any \d 123
decimal digit, this is see1
equivalent to the set
class [0-9]
\D Matches any non- \D seeks
digit character, this seek1
is equivalent to the
set class [^0-9]
\s Matches any \s see ks
whitespace a bc a
character.
\S \S a bd

P a g e 14 | 26
Microsoft Certified Power BI Data Analyst
Arindam Ghosh,9433547743
SalesForce Certified Administrator
Matches any non- abcd
whitespace
character
\w Matches any \w 123
alphanumeric seeKs4
character, this is
equivalent to the
class [a-zA-Z0-9_].
\W Matches any non- \W >$
alphanumeric see<>
character.
\Z Matches if the string ab\Z abcdab
ends with the given abababab
regex

\A: Matches if the string begins with the given character


import re
txt = "for seeks"
match = re.search(r"\Afor", txt)
print(match)
# Output: <re.Match object; span=(0, 3), match='for'>
txt = "the world for the people"
match = re.search(r"\Afor", txt)
print(match)
# Output: None
\b: Matches if the word begins or ends with the given character
import re
txt = "seeks"
match = re.search(r"\bse", txt)
print(match)
# Output: <re.Match object; span=(0, 2), match='se'>

txt = "set"
P a g e 15 | 26
Microsoft Certified Power BI Data Analyst
Arindam Ghosh,9433547743
SalesForce Certified Administrator
match = re.search(r"se\b", txt)
print(match)
# Output: <re.Match object; span=(1, 3), match='et'>

P a g e 16 | 26
Microsoft Certified Power BI Data Analyst
Arindam Ghosh,9433547743
SalesForce Certified Administrator
\B: It is the opposite of \b
import re

txt = "together"
match = re.search(r"\Bge", txt)
print(match) # Output: <re.Match object; span=(2, 4),
match='ge'>

txt = "forge"
match = re.search(r"ge\B", txt)
print(match) # Output: None

\d: Matches any decimal digit


import re

txt = "123"
match = re.search(r"\d", txt)
print(match) # Output: <re.Match object; span=(0, 1), match='1'>

txt = "see1"
match = re.search(r"\d", txt)
print(match) # Output: <re.Match object; span=(3, 4), match='1'>

\D: Matches any non-digit character


import re

txt = "seeks"
match = re.search(r"\D", txt)
print(match) # Output: <re.Match object; span=(0, 1), match='s'>

txt = "seek1"
match = re.search(r"\D", txt)
print(match) # Output: <re.Match object; span=(0, 1), match='s'>

P a g e 17 | 26
Microsoft Certified Power BI Data Analyst
Arindam Ghosh,9433547743
SalesForce Certified Administrator
\s: Matches any whitespace character
import re

txt = "see ks"


match = re.search(r"\s", txt)
print(match) # Output: <re.Match object; span=(3, 4), match=' '>

txt = "a bc a"


match = re.search(r"\s", txt)
print(match) # Output: <re.Match object; span=(1, 2), match=' '>

\S: Matches any non-whitespace character


import re

txt = "a bd"


match = re.search(r"\S", txt)
print(match) # Output: <re.Match object; span=(0, 1), match='a'>

txt = "abcd"
match = re.search(r"\S", txt)
print(match) # Output: <re.Match object; span=(0, 1), match='a'>

\w: Matches any alphanumeric character


import re

txt = "123"
match = re.search(r"\w", txt)
print(match) # Output: <re.Match object; span=(0, 1), match='1'>

txt = "seeKs4"
match = re.search(r"\w", txt)
print(match) # Output: <re.Match object; span=(0, 1), match='s'>

\W: Matches any non-alphanumeric character


import re
P a g e 18 | 26
Microsoft Certified Power BI Data Analyst
Arindam Ghosh,9433547743
SalesForce Certified Administrator
txt = ">$"
match = re.search(r"\W", txt)
print(match) # Output: <re.Match object; span=(0, 1), match='>'>

txt = "see<>"
match = re.search(r"\W", txt)
print(match) # Output: <re.Match object; span=(3, 4), match='<'>

\Z: Matches if the string ends with the given regex


import re

txt = "abcdab"
match = re.search(r"ab\Z", txt)
print(match) # Output: <re.Match object; span=(4, 6),
match='ab'>

txt = "abababab"
match = re.search(r"ab\Z", txt)
print(match) # Output: <re.Match object; span=(6, 8),
match='ab'>

P a g e 19 | 26
Microsoft Certified Power BI Data Analyst
Arindam Ghosh,9433547743
SalesForce Certified Administrator
Sets for character matching
A Set is a set of characters enclosed in ‘[]’ brackets. Sets are used to
match a single character in the set of characters specified between
brackets. Below is the list of Sets:
Set Description
\{n,\} Quantifies the preceding character or group and
matches at least n occurrences.
* Quantifies the preceding character or group and
matches zero or more occurrences.
[0123] Matches the specified digits (0, 1, 2, or 3)
[^arn] matches for any character EXCEPT a, r, and n
\d Matches any digit (0-9).
[0-5][0-9] matches for any two-digit numbers from 00 and 59
\w Matches any alphanumeric character (a-z, A-Z, 0-9,
or _).
[a-n] Matches any lower case alphabet between a and n.
\D Matches any non-digit character.
[arn] matches where one of the specified characters (a, r,
or n) are present
[a-zA-Z] matches any character between a and z, lower case
OR upper case
[0-9] matches any digit between 0 and 9

\{n,\}: Matches at least n occurrences


import re

txt = "Hellooo"
pattern = r'o{2,}'
matches = re.findall(pattern, txt)
print(matches) # Output: ['ooo']

*: Matches zero or more occurrences


import re
txt = "Hellooo"
pattern = r'o*'
P a g e 20 | 26
Microsoft Certified Power BI Data Analyst
Arindam Ghosh,9433547743
SalesForce Certified Administrator
matches = re.findall(pattern, txt)
print(matches) # Output: ['', '', '', 'ooo', '', '']
[0123]: Matches the specified digits
import re

txt = "1234"
pattern = r'[0123]'
matches = re.findall(pattern, txt)
print(matches) # Output: ['1', '2', '3']

[^arn]: Matches any character EXCEPT a, r, and n


import re

txt = "garden"
pattern = r'[^arn]'
matches = re.findall(pattern, txt)
print(matches) # Output: ['g', 'd', 'e']

\d: Matches any digit (0-9)


import re

txt = "a1b2c3"
pattern = r'\d'
matches = re.findall(pattern, txt)
print(matches) # Output: ['1', '2', '3']

[0-5][0-9]: Matches any two-digit numbers from 00 to 59


import re

txt = "My score is 45 and his score is 62"


pattern = r'[0-5][0-9]'
matches = re.findall(pattern, txt)
print(matches) # Output: ['45']

\w: Matches any alphanumeric character (a-z, A-Z, 0-9, or _)


import re

txt = "123 abc_DEF"


pattern = r'\w'
matches = re.findall(pattern, txt)
print(matches) # Output: ['1', '2', '3', 'a', 'b', 'c', '_', 'D', 'E', 'F']
P a g e 21 | 26
Microsoft Certified Power BI Data Analyst
Arindam Ghosh,9433547743
SalesForce Certified Administrator
P a g e 22 | 26
Microsoft Certified Power BI Data Analyst
Arindam Ghosh,9433547743
SalesForce Certified Administrator
[a-n]: Matches any lower case alphabet between a and n
import re

txt = "hello world"


pattern = r'[a-n]'
matches = re.findall(pattern, txt)
print(matches) # Output: ['h', 'e', 'l', 'l', 'l', 'd']

\D: Matches any non-digit character


import re

txt = "a1b2c3"
pattern = r'\D'
matches = re.findall(pattern, txt)
print(matches) # Output: ['a', 'b', 'c']

[arn]: Matches where one of the specified characters (a, r, or n) are present
import re

txt = "garden"
pattern = r'[arn]'
matches = re.findall(pattern, txt)
print(matches) # Output: ['a', 'r', 'n']

[a-zA-Z]: Matches any character between a and z, lower case OR


upper case
import re

txt = "Hello123"
pattern = r'[a-zA-Z]'
matches = re.findall(pattern, txt)
print(matches) # Output: ['H', 'e', 'l', 'l', 'o']

[0-9]: Matches any digit between 0 and 9


import re

txt = "My phone number is 1234567890"


pattern = r'[0-9]'
matches = re.findall(pattern, txt)
print(matches) # Output: ['1', '2', '3', '4', '5', '6', '7', '8', '9', '0']

P a g e 23 | 26
Microsoft Certified Power BI Data Analyst
Arindam Ghosh,9433547743
SalesForce Certified Administrator
P a g e 24 | 26
Microsoft Certified Power BI Data Analyst
Arindam Ghosh,9433547743
SalesForce Certified Administrator
P a g e 25 | 26
Microsoft Certified Power BI Data Analyst
Arindam Ghosh,9433547743
SalesForce Certified Administrator
P a g e 26 | 26
Microsoft Certified Power BI Data Analyst
Arindam Ghosh,9433547743
SalesForce Certified Administrator

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy