0% found this document useful (0 votes)
5 views65 pages

Regular Expression

The document provides an overview of regular expressions (RegEx) in Python, explaining their purpose for matching text patterns and the use of the re module for various operations like searching, splitting, and substituting strings. It details special characters, metacharacters, and flags that can modify regex behavior, along with methods such as re.findall(), re.search(), and re.sub(). Additionally, it covers how to utilize these regex functionalities effectively in Python programming.

Uploaded by

dmyacc0364
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views65 pages

Regular Expression

The document provides an overview of regular expressions (RegEx) in Python, explaining their purpose for matching text patterns and the use of the re module for various operations like searching, splitting, and substituting strings. It details special characters, metacharacters, and flags that can modify regex behavior, along with methods such as re.findall(), re.search(), and re.sub(). Additionally, it covers how to utilize these regex functionalities effectively in Python programming.

Uploaded by

dmyacc0364
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

Python - by Sanjay Mate , IICMR-MCA 1

 A regular expression is a powerful tool for matching


text, based on a pre-defined pattern. It can detect the
presence or absence of a text by matching with a
particular pattern, and also can split a pattern into one
or more sub-patterns
 Regular expressions use a sequence of characters and
symbols to define a pattern of text.
 Regular expressions are useful for finding phone
numbers, email addresses, dates, and any other data
that has a consistent format

Python - by Sanjay Mate , IICMR-MCA 2


 Python standard library provides a re module for
regular expressions. Its primary function is to offer a
search, where it takes a regular expression and a
string.

Python - by Sanjay Mate , IICMR-MCA 3


 A Regular Expression (RegEx) is a sequence of
characters that defines a search pattern. For example,

^a...s$
Any five letter string starting with
a and ending with s

Python - by Sanjay Mate , IICMR-MCA 4


Python - by Sanjay Mate , IICMR-MCA 5
Python - by Sanjay Mate , IICMR-MCA 6
 Raw string
 The raw string is slightly different from a regular
string, it won’t interpret the \ character as an escape
character.

Python - by Sanjay Mate , IICMR-MCA 7


 Some characters are metacharacters, also called as
special characters, and don’t match themselves.

Python - by Sanjay Mate , IICMR-MCA 8


 Metacharacter [ ]
 used for specifying a character class, Characters can be
listed individually, or a range of characters can be
indicated by giving two characters and separating them
by a '-'.

Python - by Sanjay Mate , IICMR-MCA 9


 ^ symbol in [ ]

Python - by Sanjay Mate , IICMR-MCA 10


 []

Python - by Sanjay Mate , IICMR-MCA 11


 . dot

Python - by Sanjay Mate , IICMR-MCA 12


 ^ Caret

Python - by Sanjay Mate , IICMR-MCA 13


 $ dollar

Python - by Sanjay Mate , IICMR-MCA 14


 * star

Python - by Sanjay Mate , IICMR-MCA 15


 + plus

Python - by Sanjay Mate , IICMR-MCA 16


 { } Braces

Python - by Sanjay Mate , IICMR-MCA 17


 { } Braces

Python - by Sanjay Mate , IICMR-MCA 18


 | Alternation

Python - by Sanjay Mate , IICMR-MCA 19


 ( ) Group

 Grouping constructs break up a regex in


Python into sub expressions or groups. This
serves two purposes:
◦ Grouping: A group represents a single syntactic
entity. Additional metacharacters apply to the entire
group as a unit.
◦ Capturing: Some grouping constructs also capture
the portion of the search string that matches the
subexpression in the group. You can retrieve
captured matches later through several different
mechanisms.

Python - by Sanjay Mate , IICMR-MCA 20


 ( ) Group

Python - by Sanjay Mate , IICMR-MCA 21


 \ slash

 Backlash \ is used to escape various


characters including all metacharacters.
 \$a match if a string contains $ followed
by a. Here, $ is not interpreted by a RegEx
engine in a special way.

Python - by Sanjay Mate , IICMR-MCA 22


 The following list of special sequences

Python - by Sanjay Mate , IICMR-MCA 23


 \A - Matches if the specified characters are at the
start of a string.

Python - by Sanjay Mate , IICMR-MCA 24


 \b - Matches if the specified characters are at the
beginning or end of a word.

Python - by Sanjay Mate , IICMR-MCA 25


 \B - Opposite of \b. Matches if the specified
characters are not at the beginning or end of a word..

Python - by Sanjay Mate , IICMR-MCA 26


 \d - Matches any decimal digit. Equivalent to [0-9]

Python - by Sanjay Mate , IICMR-MCA 27


 \D - Matches any non-decimal digit. Equivalent
to [^0-9]

Python - by Sanjay Mate , IICMR-MCA 28


 \s - Matches where a string contains any whitespace
character. Equivalent to [ \t\n\r\f\v]

Python - by Sanjay Mate , IICMR-MCA 29


 \S - Matches where a string contains any non-
whitespace character. Equivalent to [^ \t\n\r\f\v].

Python - by Sanjay Mate , IICMR-MCA 30


 \w - Matches any alphanumeric character (digits and
alphabets). Equivalent to [a-zA-Z0-9_].
 underscore _ is also considered an alphanumeric
character.

Python - by Sanjay Mate , IICMR-MCA 31


 \W - Matches any non-alphanumeric character.
Equivalent to [^a-zA-Z0-9_]

Python - by Sanjay Mate , IICMR-MCA 32


 \Z - Matches if the specified characters are at
the end of a string.

Python - by Sanjay Mate , IICMR-MCA 33


 Python has a module named re to work with
regular expressions. To use it, we need to
import the module.

 re.findall()
 The re.findall() method returns a list of strings
containing all matches.

Python - by Sanjay Mate , IICMR-MCA 34


 re.findall()

Python - by Sanjay Mate , IICMR-MCA 35


 re.split()
 The re.split method splits the string where there
is a match and returns a list of strings where the
splits have occurred.

Python - by Sanjay Mate , IICMR-MCA 36


 re.split()

Python - by Sanjay Mate , IICMR-MCA 37


 re.split()
 You can pass maxsplit argument to
the re.split() method. It's the maximum number
of splits that will occur.

Python - by Sanjay Mate , IICMR-MCA 38


 re.sub()
 The method returns a string where matched
occurrences are replaced with the content
of replace variable.

Python - by Sanjay Mate , IICMR-MCA 39


 re.sub()
 The method returns a string where matched
occurrences are replaced with the content
of replace variable.

Python - by Sanjay Mate , IICMR-MCA 40


 re.sub()
 You can pass count as a fourth parameter to
the re.sub() method. If omited, it results to 0.
This will replace all occurrences.

Python - by Sanjay Mate , IICMR-MCA 41


 re.subn()
 The re.subn() is similar to re.sub() except it
returns a tuple of 2 items containing the new
string and the number of substitutions made.

Python - by Sanjay Mate , IICMR-MCA 44


 re.search()
 The re.search() method takes two arguments: a
pattern and a string. The method looks for the
first location where the RegEx pattern produces
a match with the string..
 If the search is successful, re.search() returns a
match object; if not, it returns None

 match = re.search(pattern, str).

Python - by Sanjay Mate , IICMR-MCA 45


 re.search()
 match = re.search(pattern, str).

Python - by Sanjay Mate , IICMR-MCA 46


 re.fullmatch()
 Unlike the match() method, which performs the
pattern matching only at the beginning of the
string, the re.fullmatch method returns a match
object if and only if the entire target string from
the first to the last character matches the
regular expression pattern.

Python - by Sanjay Mate , IICMR-MCA 47


 re.fullmatch()

Python - by Sanjay Mate , IICMR-MCA 48


 whenever we found a match to the regex
pattern, Python returns us the Match object

Python - by Sanjay Mate , IICMR-MCA 49


 match.group()
 The group() method returns the part of the
string where there is a match.

Python - by Sanjay Mate , IICMR-MCA 50


 match.group()

Python - by Sanjay Mate , IICMR-MCA 51


 match.start(), match.end() and match.span()

 The start() function returns the index of the start


of the matched substring
 end() returns the end index of the matched
substring

Python - by Sanjay Mate , IICMR-MCA 52


 match.re and match.string
 The re attribute of a matched object returns a
regular expression object.
Similarly, string attribute returns the passed
string.

Python - by Sanjay Mate , IICMR-MCA 53


Python - by Sanjay Mate , IICMR-MCA 54
 Python regex allows optional flags to specify
when using regular expression patterns
with match(), search(), and split(), among others.

 IGNORECASE flag
◦ which stands for ignoring a case. specified this flag in
the regex method as an argument to perform case
insensitive matching.

Python - by Sanjay Mate , IICMR-MCA 55


 IGNORECASE flag
◦ re.I
◦ re.IGNORECASE

Python - by Sanjay Mate , IICMR-MCA 56


 DOTALL flag
◦ By default, the dot(.) metacharacter inside the regular
expression pattern represents any character, be it a
letter, digit, symbol, or a punctuation mark, except the
new line character, which is \n.

Python - by Sanjay Mate , IICMR-MCA 57


 DOTALL flag
◦ re.S
◦ re.DOTALL

Python - by Sanjay Mate , IICMR-MCA 58


 VERBOSE flag
◦ That re.X flag stands for verbose. This flag allows more
flexibility and better formatting when writing more
complex regex patterns between the parentheses of
the match(), search(), or other regex methods.

Python - by Sanjay Mate , IICMR-MCA 59


 VERBOSE flag
◦ re.X
◦ re.VERBOSE

Python - by Sanjay Mate , IICMR-MCA 60


 MULTILINE flag
◦ The re.M flag is used as an argument inside the regex
method to perform a match inside a multiline block of
text.

 re.M
 re.MULTILINE

Python - by Sanjay Mate , IICMR-MCA 61


 MULTILINE flag
◦ his flag is used with metacharacter ^ and $.
 The caret (^)matches a pattern only at the
beginning of the string
 The dollar ($) matches the regular expression
pattern at the end of the string

When this flag is specified, the pattern


character ^ matches at the beginning of the string and
each newline’s start (\n). And the metacharacter
character $ match at the end of the string and the end of
each newline (\n).

Python - by Sanjay Mate , IICMR-MCA 62


 MULTILINE flag

Python - by Sanjay Mate , IICMR-MCA 63


 ASCII flag

 regex \w, \W, \b, \B, \d, \D, \s and \S perform


ASCII-only matching instead of full Unicode
matching. This is only meaningful for Unicode
patterns and is ignored for byte patterns.

Python - by Sanjay Mate , IICMR-MCA 64


 ASCII flag
◦ re.A
◦ re.ASCII

Python - by Sanjay Mate , IICMR-MCA 65


Python - by Sanjay Mate , IICMR-MCA 66
Python - by Sanjay Mate , IICMR-MCA 67

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy