Regular Expressions

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11

Regular Expressions

Regular expressions are patterns that allow you to describe, match, or parse text. With regular expressions, you can
do things like find and replace text, verify that input data follows the format required, and and other similar things.
Here's a scenario: you want to verify that the telephone number entered by a user on a form matches a format,
say, ###-###-#### (where # represents a number). One way to solve this could be:
Alternatively, we can use a regular expression here like this:
function isPattern(userInput) {
return /^\d{3}-\d{3}-\d{4}$/.test(userInput);
}
Notice how we’ve refactored the code using regex. Amazing right? That is the power of regular expressions.
How to Create A Regular Expression
In JavaScript, you can create a regular expression in either of two ways:
 Method #1: using a regular expression literal. This consists of a pattern enclosed in forward slashes. You
can write this with or without a flag (we will see what flag means shortly). The syntax is as follows:
const regExpLiteral = /pattern/; // Without flags

const regExpLiteralWithFlags = /pattern/; // With flags


The forward slashes /…/ indicate that we are creating a regular expression pattern, just the same way you use
quotes “ ” to create a string.
 Method #2: using the RegExp constructor function. The syntax is as follows:
new RegExp(pattern [, flags])
Here, the pattern is enclosed in quotes, the same as the flag parameter, which is optional.
So when do you use each of these pattern?
You should use a regex literal when you know the regular expression pattern at the time of writing the code.
On the other hand, use the Regex constructor if the regex pattern is to be created dynamically. Also, the regex
constructor lets you write a pattern using a template literal, but this is not possible with the regex literal syntax.
What are Regular Expression Flags?
Flags or modifiers are characters that enable advanced search features including case-insensitive and global
searching. You can use them individually or collectively. Some commonly used ones are:
 g is used for global search which means the search will not return after the first match.
 i is used for case-insensitive search meaning that a match can occur regardless of the casing.
 m is used for multiline search.
 u is used for Unicode search.
Let’s look at some regular expression patterns using both syntaxes.
How to use a regular expression literal:
// Syntax: /pattern/flags

const regExpStr = 'Hello world! hello there';

const regExpLiteral = /Hello/gi;

console.log(regExpStr.match(regExpLiteral));

// Output: ['Hello', 'hello']


Note that if we did not flag the pattern with i, only Hello will be returned.
The pattern /Hello/ is an example of a simple pattern. A simple pattern consists of characters that must appear
literally in the target text. For a match to occur, the target text must follow the same sequence as the pattern.
For example, if you re-write the text in the previous example and try to match it:
const regExpLiteral = /Hello/gi;

const regExpStr = 'oHell world, ohell there!';

console.log(regExpStr.match(regExpLiteral));

// Output: null
We get null because the characters in the string do not appear as specified in the pattern. So a literal pattern such
as /hello/, means h followed by e followed by l followed by l followed by o, exactly like that.
How to use a regex constructor:
// Syntax: RegExp(pattern [, flags])

const regExpConstructor = new RegExp('xyz', 'g'); // With flag -g

const str = 'xyz xyz';

console.log(str.match(regExpConstructor));

// Output: ['xyz', 'xyz']


Here, the pattern xyz is passed in as a string same as the flag. Also both occurrences of xyz got matched because we
passed in the -g flag. Without it, only the first match will be returned.
We can also pass in dynamically created patterns as template literals using the constructor function. For example:
const pattern = prompt('Enter a pattern');
// Suppose the user enters 'xyz'

const regExpConst = new RegExp(`${pattern}`, 'gi');

const str = 'xyz XYZ';

console.log(str.match(regExpConst)); // Output: ['xyz', 'XYZ']


How to Use Regular Expression Special Characters
A special character in a regular expression is a character with a reserved meaning. Using special characters, you can
do more than just find a direct match.
For example, if you want to match a character in a string that may or may not appear once or multiple times, you
can do this with special characters. These characters fit into different subgroups that perform similar functions.
Let's take a look at each subgroup and the characters that go with them.
Anchors and Boundaries:
Anchors are metacharacters that match the start and end of a line of text they are examining. You use them to
assert where a boundary should be.
The two characters used are ^ and $.
 ^ matches the start of a line and anchors a literal at the beginning of that line. For example:
const regexPattern1 = /^cat/;

console.log(regexPattern1.test('cat and mouse')); // Output: true

console.log(regexPattern1.test('The cat and mouse')); // Output: false because the line does not start with cat

// Without the ^ in the pattern, the output will return true


// because we did not assert a boundary.

const regexPattern2 = /cat/;

console.log(regexPattern2.test('The cat and mouse')); // Output: true


 $ matches the end of a line and anchors a literal at the end of that line. For example:
const regexPattern = /cat$/;

console.log(regexPattern.test('The mouse and the cat')); // Output: true

console.log(regexPattern.test('The cat and mouse')); // Output: false


Note that anchors characters ^ and $ match just the position of the characters in the pattern and not the actual
characters themselves.
Word Boundaries are metacharacters that match the start and end position of a word – a sequence of
alphanumeric characters. You can think of them as a word-based version of ^ and $. You use the metacharacters b
and B to assert a word boundary.
 \b matches the start or end of a word. The word is matched according to the position of the metacharacter.
Here's an example:
// Syntax 1: /\b.../ where .... represents a word.

// Search for a word that begins with the pattern ward


const regexPattern1 = /\bward/gi;

const text1 = 'backward Wardrobe Ward';

console.log(text1.match(regexPattern1)); // Output: ['Ward', 'Ward']

// Syntax 2: /...\b/

// Search for a word that ends with the pattern ward


const regexPattern2 = /ward\b/gi;

const text2 = 'backward Wardrobe Ward';

console.log(text2.match(regexPattern2)); // Output: ['ward', 'Ward']

// Syntax 3: /\b....\b/

// Search for a stand-alone word that begins and end with the pattern ward
const regexPattern3 = /\bward\b/gi;

const text3 = 'backward Wardrobe Ward';

console.log(text3.match(regexPattern3)); // Output: ['Ward']


 \B is opposite of \b . It matches every position \b doesn't.
Shortcodes for Other Metacharacters:
In addition to the metacharacters we have looked at, here are some of the most commonly used ones:
 \d – matches any decimal digit and is shorthand for [0-9].
 \w – matches any alphanumeric character which could be a letter, a digit, or an underscore. \w is
shorthand for [A-Za-z0-9_].
 \s – matches any white space character.
 \D – matches any non-digit and is the same as [^0-9.]
 \W – matches any non-word (that is non-alphanumeric) character and is shorthand for [^A-Za-z0-9_].
 \S – matches a non-white space character.
 . – matches any character.
What is a Character Class?
A character class is used to match any one of several characters in a particular position. To denote a character class,
you use square brackets [] and then list the characters you want to match inside the brackets.
Let's look at an example:
// Find and match a word with two alternative spellings

const regexPattern = /ambi[ea]nce/;

console.log(regexPattern.test('ambiance')); // Output: true

console.log(regexPattern.test('ambiance')); // Output: true

// The regex pattern interprets as: find a followed by m, then b,


// then i, then either e or a, then n, then c, and then e.
What is a Negated Character Class?
If you add a caret symbol inside a character class like this [^...], it will match any character that is not listed inside
the square brackets. For example:
const regexPattern = /[^bc]at/;

console.log(regexPattern.test('bat')); // Output: false


console.log(regexPattern.test('cat')); // Output: false

console.log(regexPattern.test('mat')); // Output: true


What is a Range?
A hyphen - indicates range when used inside a character class. Suppose you want to match a set of numbers, say
[0123456789], or a set of characters, say[abcdefg]. You can write it as a range like this, [0-9] and [a-g], respectively.
What is Alternation?
Alternation is yet another way you can specify a set of options. Here, you use the pipe character | to match any of
several subexpressions. Either of the subexpressions is called an alternative.
The pipe symbol means ‘or’, so it matches a series of options. It allows you combine subexpressions as alternatives.
For example, (x|y|z)a will match xa or ya, or za. In order to limit the reach of the alternation, you can use
parentheses to group the alternatives together.
Without the parentheses, x|y|za would mean x or y or za. For example:
const regexPattern = /(Bob|George)\sClan/;

console.log(regexPattern.test('Bob Clan')); // Output: true

console.log(regexPattern.test('George Clan')); // Output: true


What are Quantifiers and Greediness?
Quantifiers denote how many times a character, a character class, or group should appear in the target text for a
match to occur. Here are some peculiar ones:
 + will match any character it is appended to if the character appears at least once. For example:
const regexPattern = /hel+o/;

console.log(regexPattern.test('helo')); // Output:true

console.log(regexPattern.test('hellllllllllo')); // Output: true

console.log(regexPattern.test('heo')); // Output: false


 * is similar to the + character but with a slight difference. When you append * to a character, it means you
want to match any number of that character including none. Here’s an example:
const regexPattern = /hel*o/;

console.log(regexPattern.test('helo')); // Output: true

console.log(regexPattern.test('hellllo')); // Output: true

console.log(regexPattern.test('heo')); // Output: true

// Here the * matches 0 or any number of 'l'


 ? implies "optional". When you append it to a character, it means the character may or may not appear. For
example:
const regexPattern = /colou?r/;

console.log(regexPattern.test('color')); // Output: true

console.log(regexPattern.test('colour')); // Output: true

// The ? after the character u makes u optional


 {N}, when appended to a character or character class, specifies how many of the character we want. For
example /\d{3}/ means match three consecutive digits.
 {N,M} is called the interval quantifier and is used to specify a range for the minimum and maximum
possible match. For example /\d{3, 6}/ means match a minimum of 3 and a maximum of 6 consecutive
digits.
 {N, } denotes an open-ended range. For example /\d{3, }/ means match any 3 or more consecutive digits.
What is Greediness in Regex?
All quantifiers by default are greedy. This means that they will try to match all possible characters.
To remove this default state and make them non-greedy, you append a ? to the operator like this +?, *?, {N}?,
{N,M}?.....and so on.
What are Grouping and Backreferencing?
We previously looked at how we can limit the scope of alternation using the parentheses.
What if you want to use a quantifier like + or * on more than one character at a time – say a character class or
group? You can group them together as a whole using the parentheses before appending the quantifier, just like in
this example:
const regExp = /abc+(xyz+)+/i;

console.log(regExp.test('abcxyzzzzXYZ')); // Output: true


Here's what the pattern means: The first + matches the c of abc, the second + matches the z of xyz, and the third +
matches the subexpression xyz, which will match if the sequence repeats.
Backreferencing allows you to match a new pattern that is the same as a previously matched pattern in a regular
expression. You also use parentheses for backreferencing because it can remember a previously matched
subexpression it encloses (that is, the captured group).
However, it is possible to have more than one captured group in a regular expression. So, to backreference any of
the captured group, you use a number to identify the parentheses.
Suppose you have 3 captured groups in a regex and you want to backreference any of them. You use \1, \2, or \3,
to refer to the first, second, or third parentheses. To number the parentheses, you start counting the open
parentheses from the left.
Let's look at some examples:
(x) matches x and remembers the match.
const regExp = /(abc)bar\1/i;

// abc is backreferenced and is anchored at the same position as \1


console.log(regExp.test('abcbarAbc')); // Output: true

console.log(regExp.test('abcbar')); // Output: false


(?:x) matches x but does not recall the match. Also, \n (where n is a number) does not remember a previously
captured group, and will match as a literal. Using an example:
const regExp = /(?:abc)bar\1/i;

console.log(regExp.test('abcbarabc')); // Output: false

console.log(regExp.test('abcbar\1')); // Output: true


The Escape Rule
A metacharacter has to be escaped with a backslash if you want it to appear as a literal in your regular expression.
By escaping a metacharacter in regex, the metacharacter loses its special meaning.
Regular Expression Methods
The test() method
We have used this method a number of times in this article. The test() method compares the target text with the
regex pattern and returns a boolean value accordingly. If there is a match, it returns true, otherwise it returns false.
const regExp = /abc/i;

console.log(regExp.test('abcdef')); // Output: true

console.log(regExp.test('bcadef')); // Output: false


The exec() method
The exec() method compares the target text with the regex pattern. If there's a match, it returns an array with the
match – otherwise it returns null. For example:
const regExp = /abc/i;

console.log(regExp.exec('abcdef'));
// Output: ['abc', index: 0, input: 'abcdef', groups: undefined]

console.log(regExp.exec('bcadef'));
// Output: null
Also, there are string methods that accept regular expressions as a parameter like match(), replace(), replaceAll(),
matchAll(), search(), and split().
Regex Examples
Here are some examples to reinforce some of the concepts we've learned in this article.
First example: How to use a regex pattern to match an email address:
const regexPattern = /^[(\w\d\W)+]+@[\w+]+\.[\w+]+$/i;

console.log(regexPattern.test('abcdef123@gmailcom'));
// Output: false, missing dot

console.log(regexPattern.test('abcdef123gmail.'));
// Output: false, missing end literal 'com'

console.log(regexPattern.test('abcdef123@gmail.com'));
// Output: true, the input matches the pattern correctly
Let's interpret the pattern. Here's what's happening:
 / represents the start of the regular expression pattern.
 ^ checks for the start of a line with the characters in the character class.
 [(\w\d\W)+ ]+ matches any word, digit and non-word character in the character class at least once. Notice
how the parentheses were used to group the characters before adding the quantifier. This is same as this [\
w+\d+\W+]+ .
 @ matches the literal @ in the email format.
 [\w+]+ matches any word character in this character class at least once.
 \. escapes the dot so it appears as a literal character.
 [\w+]+$ matches any word character in this class. Also this character class is anchored at the end of the
line.
 / - ends the pattern
Alright, next example: how to match a URL with format http://example.com or https://www.example.com:
const pattern = /^[https?]+:\/\/((w{3}\.)?[\w+]+)\.[\w+]+$/i;

console.log(pattern.test('https://www.example.com'));
// Output: true

console.log(pattern.test('http://example.com'));
// Output: true

console.log(pattern.test('https://example'));
// Output: false
Let's also interpret this pattern. Here's what's happening:
 /...../ represents the start and end of the regex pattern
 ^ asserts for the start of the line
 [https?]+ matches the characters listed at least once, however ? makes 's' optional.
 : matches a literal semi-colon.
 \/\/ escapes the two forward slashes.
 (w{3}\.) matches the character w 3 times and the dot that follows immediately. However, this group is
optional.
 [\w+]+ matches character in this class at least once.
 \. escapes the dot
 [\w+]+$ matches any word character in this class. Also this character class is anchored at the end of the
line.
avaScript Regex
In JavaScript, a Regular Expression (RegEx) is an object that describes a sequence of characters used for defining a
search pattern. For example,
/^a...s$/
The above code defines a RegEx pattern. The pattern is: any five letter string starting with a and ending with s.
A pattern defined using RegEx can be used to match against a string.
Expression String Matched?
abs No match
alias Match
/^a...s$/ abyss Match
Alias No match
An abacus No match
Create a RegEx
There are two ways you can create a regular expression in JavaScript.
1. Using a regular expression literal:
The regular expression consists of a pattern enclosed between slashes /. For example,
cost regularExp = /abc/;
Here, /abc/ is a regular expression.
2. Using the RegExp() constructor function:
You can also create a regular expression by calling the RegExp() constructor function. For example,
const reguarExp = new RegExp('abc');
For example,
const regex = new RegExp(/^a...s$/);
console.log(regex.test('alias')); // true
Run Code
In the above example, the string alias matches with the RegEx pattern /^a...s$/. Here, the test() method is used to
check if the string matches the pattern.
There are several other methods available to use with JavaScript RegEx. Before we explore them, let's learn about
regular expressions themselves.
If you already know the basics of RegEx, jump to JavaScript RegEx Methods.
Specify Pattern Using RegEx
To specify regular expressions, metacharacters are used. In the above example (/^a...s$/), ^ and $ are
metacharacters.
MetaCharacters
Metacharacters are characters that are interpreted in a special way by a RegEx engine. Here's a list of
metacharacters:
[] . ^ $ * + ? {} () \ |
[] - Square brackets
Square brackets specify a set of characters you wish to match.
Expression String Matched?
a 1 match
ac 2 matches
[abc]
Hey Jude No match
abc de ca 5 matches
Here, [abc] will match if the string you are trying to match contains any of the a, b or c.
You can also specify a range of characters using - inside square brackets.
[a-e] is the same as [abcde].
[1-4] is the same as [1234].
[0-39] is the same as [01239].
You can complement (invert) the character set by using caret ^ symbol at the start of a square-bracket.
[^abc] means any character except a or b or c.
[^0-9] means any non-digit character.
. - Period
A period matches any single character (except newline '\n').
Expression String Matched?
a No match
ac 1 match
..
acd 1 match
acde 2 matches (contains 4 characters)
^ - Caret
The caret symbol ^ is used to check if a string starts with a certain character.
Expression String Matched?
a 1 match
^a abc 1 match
bac No match
abc 1 match
^ab
acb No match (starts with a but not followed by b)
$ - Dollar
The dollar symbol $ is used to check if a string ends with a certain character.
Expression String Matched?
a 1 match
a$ formula 1 match
cab No match
* - Star
The star symbol * matches zero or more occurrences of the pattern left to it.
Expression String Matched?
mn 1 match
man 1 match
ma*n mann 1 match
main No match (a is not followed by n)
woman 1 match
+ - Plus
The plus symbol + matches one or more occurrences of the pattern left to it.
Expression String Matched?
mn No match (no a character)
man 1 match
ma+n mann 1 match
main No match (a is not followed by n)
woman 1 match
? - Question Mark
The question mark symbol ? matches zero or one occurrence of the pattern left to it.
Expression String Matched?
mn 1 match
man 1 match
ma?n maan No match (more than one a character)
main No match (a is not followed by n)
woman 1 match
{} - Braces
Consider this code: {n,m}. This means at least n, and at most m repetitions of the pattern left to it.
Expression String Matched?
abc dat No match
abc daat 1 match (at daat)
a{2,3}
aabc daaat 2 matches (at aabc and daaat)
aabc daaaat 2 matches (at aabc and daaaat)
Let's try one more example. This RegEx [0-9]{2, 4} matches at least 2 digits but not more than 4 digits.
Expression String Matched?
ab123csde 1 match (match at ab123csde)
[0-9]{2,4} 12 and 345673 3 matches (12, 3456, 73)
1 and 2 No match
| - Alternation
Vertical bar | is used for alternation (or operator).
Expression String Matched?
a|b cde No match
ade 1 match (match at ade)
acdbea 3 matches (at acdbea)
Here, a|b match any string that contains either a or b
() - Group
Parentheses () is used to group sub-patterns. For example, (a|b|c)xz match any string that matches either a or b or
c followed by xz
Expression String Matched?
ab xz No match
(a|b|c)xz abxz 1 match (match at abxz)
axz cabxz 2 matches (at axzbc cabxz)
\ - Backslash
Backslash \ is used to escape various characters including all metacharacters. For example,
\$a match if a string contains $ followed by a. Here, $ is not interpreted by a RegEx engine in a special way.
If you are unsure if a character has special meaning or not, you can put \ in front of it. This makes sure the
character is not treated in a special way.
Special Sequences
Special sequences make commonly used patterns easier to write. Here's a list of special sequences:
\A - Matches if the specified characters are at the start of a string.
Expression String Matched?
the sun Match
\Athe
In the sun No match
\b - Matches if the specified characters are at the beginning or end of a word.
Expression String Matched?
football Match
\bfoo
a football Match
a football No match
the foo Match
foo\b
the afoo test Match
the afootest No match
\B - Opposite of \b. Matches if the specified characters are not at the beginning or end of a word.
Expression String Matched?
football No match
\Bfoo
a football No match
a football Match
the foo No match
foo\B
the afoo test No match
the afootest Match
\d - Matches any decimal digit. Equivalent to [0-9]
Expression String Matched?
12abc3 3 matches (at 12abc3)
\d
JavaScript No match
\D - Matches any non-decimal digit. Equivalent to [^0-9]
Expression String Matched?
1ab34"50 3 matches (at 1ab34"50)
\D
1345 No match
\s - Matches where a string contains any whitespace character. Equivalent to [ \t\n\r\f\v].
Expression String Matched?
JavaScript RegEx 1 match
\s
JavaScriptRegEx No match
\S - Matches where a string contains any non-whitespace character. Equivalent to [^ \t\n\r\f\v].
Expression String Matched?
a b 2 matches (at a b)
\S
No match
\w - Matches any alphanumeric character (digits and alphabets). Equivalent to [a-zA-Z0-9_]. By the way, underscore
_ is also considered an alphanumeric character.
Expression String Matched?
12&": ;c 3 matches (at 12&": ;c)
\w
%"> ! No match
\W - Matches any non-alphanumeric character. Equivalent to [^a-zA-Z0-9_]
Expression String Matched?
1a2%c 1 match (at 1a2%c)
\W
JavaScript No match

\Z - Matches if the specified characters are at the end of a string.


Expression String Matched?
I like JavaScript 1 match
JavaScript\Z I like JavaScript Programming No match
JavaScript is fun No match
Tip: To build and test regular expressions, you can use RegEx tester tools such as regex101. This tool not only helps
you in creating regular expressions, but it also helps you learn it.
Now you understand the basics of RegEx, let's discuss how to use RegEx in your JavaScript code.
JavaScript Regular Expression Methods
As mentioned above, you can either use RegExp() or regular expression literal to create a RegEx in JavaScript.
const regex1 = /^ab/;
const regex2 = new Regexp('/^ab/');
In JavaScript, you can use regular expressions with RegExp() methods: test() and exec().
There are also some string methods that allow you to pass RegEx as its parameter. They are: match(), replace(),
search(), and split().
Method Description
Executes a search for a match in a string and returns an array of information. It returns null on a
exec()
mismatch.
test() Tests for a match in a string and returns true or false.
match() Returns an array containing all the matches. It returns null on a mismatch.
matchAll() Returns an iterator containing all of the matches.
search() Tests for a match in a string and returns the index of the match. It returns -1 if the search fails.
replace() Searches for a match in a string and replaces the matched substring with a replacement substring.
split() Break a string into an array of substrings.
Example 1: Regular Expressions
const string = 'Find me';
const pattern = /me/;

// search if the pattern is in string variable


const result1 = string.search(pattern);
console.log(result1); // 5

// replace the character with another character


const string1 = 'Find me';
string1.replace(pattern, 'found you'); // Find found you

// splitting strings into array elements


const regex1 = /[\s,]+/;
const result2 = 'Hello world! '.split(regex1);
console.log(result2); // ['Hello', 'world!', '']

// searching the phone number pattern


const regex2 = /(\d{3})\D(\d{3})-(\d{4})/g;
const result3 = regex2.exec('My phone number is: 555 123-4567.');
console.log(result3); // ["555 123-4567", "555", "123", "4567"]
Regular Expression Flags
Flags are used with regular expressions that allow various options such as global search, case-insensitive search,
etc. They can be used separately or together.
Flags Description
g Performs a global match (find all matches)
m Performs multiline match
i Performs case-insensitive matching
Example 2: Regular Expression Modifier
const string = 'Hello hello hello';
// performing a replacement
const result1 = string.replace(/hello/, 'world');
console.log(result1); // Hello world hello

// performing global replacement


const result2 = string.replace(/hello/g, 'world');
console.log(result2); // Hello world world

// performing case-insensitive replacement


const result3 = string.replace(/hello/i, 'world');
console.log(result3); // world hello hello

// performing global case-insensitive replacement


const result4 = string.replace(/hello/gi, 'world');
console.log(result4); // world world world
Example 3: Validating the Phone Number
// program to validate the phone number
function validatePhone(num) {
// regex pattern for phone number
const re = /^\(?([0-9]{3})\)?[-. ]?([0-9]{3})[-. ]?([0-9]{4})$/g;

// take input
let number = prompt('Enter a number XXX-XXX-XXXX');

Enter a number XXX-XXX-XXXX: 2343223432


Enter number in XXX-XXX-XXXX format: 234-322-3432
The number is valid
Example 4: Validating the Email Address
// program to validate the email address
// regex pattern for email
const re = /\S+@\S+\.\S+/g;
Output
Enter an email: hellohello
Enter a valid email: learningJS@gmail.com

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy