Guide to Python’s re Module

December 22, 2023 4 minute read

Regular expression, shortened to “regex”, is a powerful tool for text processing and manipulation. They allow you to search, extract, and replace patterns in strings. In Python, the re module provides access to this powerful functionality, making it an essential tool for any Python developer.

Getting Started with the re Module:

To use the re module, simply import it into your Python code

import re

Once imported, you can utilize various functions like match, search, findall, sub, and compile to work with regular expressions.

print(re.match('a', 'apple'))
# output: <re.Match object; span=(0, 1), match='a'>

print(re.match('b', 'apple'))
# output: None

In the next section, we will delve deeper into the re module’s functionalities and explore practical examples of its applications.

Metacharacters

Metacharacters . ^ $ * + ? { } [ ] \ | ( ) are special characters in regular expressions that have a specific meaning beyond their literal representation. They provide powerful tools for building complex and efficient patterns for text matching and manipulation.

.: Matches any single character (except newline)
a.a MATCHES “aaa”, “aba”, “a a”, …
a.a does NOT MATCHES “abba”, “a\na”, …
^: Matches the beginning of the string
^abc MATCHES “abcd”, “abc_”, …
^abc does NOT MATCHES “aabc”, “_abc”, …
$: Matches the end of the string
abc$ MATCHES “aabc”, “_abc”, …
abc$ does NOT MATCHES “abcd”, “abc_”, …
*: Matches 0 or more repetitions
ab*c MATCHES “abbc”, “ac”, …
ab*c does NOT MATCHES “abdc”, “a c”, …
+: Matches 1 or more repetitions
ab+c MATCHES “abc”, “abbc”, …
ab+c does NOT MATCHES “ac”, “abdc”, …
?: Matches 0 or 1 repetitions
ab?c MATCHES “ac”, “abc”, …
ab?c does NOT MATCHES “abbc”, “abdc”, …
{m}: where m is a number
Matches exactly m repetitions
ab{3}c MATCHES “abbbc”, “abbbcde”, …
ab{3}c does NOT MATCHES “abbc”, “abb_c”, …
{m,n}: where m and n are numbers
Matches exactly from m to n repetitions
ab{3,5}c MATCHES “abbbc”, “abbbbbc”, …
ab{3,5}c does NOT MATCHES “abbc”, “abbbbbbc”, …
[ ]: Mathes any characters within the specified range
[abc] MATCHES “a”, “b”, “c”, “az”, …
[a-c] MATCHES “a”, “b”, “c”, “zb”, …
[^a-c] MATCHES any characters EXCEPT “a”, “b”, or “c”
\: Escapes special characters or signals a special sequence
a\^b MATCHES “a^b”, “a^bc”, …
a\^b does NOT MATCHES “a\^b”, …
A|B: where A and B are expressions
Matches either A or B
ab|cd MATCHES “abc”, “bcd”, …
ab|cd does NOT MATCHES “aacc”, “bc”, …
( ): Matches and captures expression inside parentheses
(bcd) MATCHES “abcd”, “bcde”, …
(bcd) does NOT MATCHES “abc”, “cde”, …

Special Sequences

Beyond basic characters, Python’s re module offers special sequences like \d (digits), \s (whitespace), and \w (words). These tools let you build patterns to match specific text formats, extract data, and manipulate strings with precision.

\d: Matches any digit (0-9)
\d MATCHES “1234”, “12 eggs”, …
\d does NOT MATCHES “abcd”, “dozen”, …
\D: Matches any non-digit
\D MATCHES “abcd”, “dozen”, …
\d does NOT MATCHES “1234”, “12”, …
\s: Matches any whitespace character (space, tab, newline)
ab\scd MATCHES “ab cd”, “ab\tcd”, “ab\xa0cd” …
ab\scd does NOT MATCHES “abcd”, “ab_cd”, …
\S: Matches any non-whitespace character
a\Sbcd MATCHES “abcd”, “a_bcd”, “aabcd” …
a\Sbcd does NOT MATCHES “a bcd”, “a\tcd”, …
\w: Matches any word character (a-z, A-Z, 0-9, and )
a\wb MATCHES “aab”, “a5b”, “a_b” …
a\wb does NOT MATCHES “a b”, “a+b”, …
\W: Matches any non-word character
a\Wb MATCHES “a@b”, “a b”, “a.b” …
a\Wb does NOT MATCHES “aab”, “a_b”, …

Functions

Now, let us look at some of the common Python Regex functions.

re.search(pattern, string)

Find the pattern within the string returns re.Match object

re.search('\d','I am 12 years old')
# output: <re.Match object; span=(5, 6), match='1'>

re.match(pattern, string)

Find the pattern at the beginning of a string returns re.Match object

re.match('\d','12 years old')
# output: <re.Match object; span=(0, 1), match='1'>

re.split(pattern, string)

Divides a string based on the pattern returns list of string

result = re.split(':','Question:Answer')
print(result)
# output: ['Question', 'Answer']

re.findall(pattern, string)

Extracts all occurrences of the pattern returns list of string

result = re.findall('\w+','#pizza, #coke')
print(result)
# output: ['pizza', 'coke']

re.sub(pattern, replace, string)

Replaces matched patterns with new text returns string

result = re.sub('\d+','2020', 'I was born in 2000')
print(result)
# output: I was born in 2020

re.compile(pattern)

Compile a regular expression pattern returns re.Pattern object

pattern = re.compile('\d+')
result = pattern.search('I was born in 2000')
print(result)
# output: <re.Match object; span=(14, 18), match='2000'>

re.Match Objects

Match.group()

Returns captured string that matches the pattern inside ( )

pattern = re.compile(r'(.*?)@(\w+\.\w+)')
email = 'username@company.com'
match = pattern.search(email)
print(match)
# output: <re.Match object; span=(0, 20), match='username@company.com'>

From the above example, match captured two strings from email address. To return the match strings;

match.group(0)        # The entire match
# output: 'username@company.com'
match.group(1)        # The first subgroup
# output: 'username'
match.group(2)       # The second subgroup
# output: 'company.com'  
match.group(1,2)      # Multiple arguments give us a tuple
# output: ('username', 'company.com')

If the regular expression uses the (?P<name>...) syntax, the captured string can also be identifed by their group name.

pattern = re.compile(r'(?P<name>.*?)@(?P<domain>\w+\.\w+)')
email = 'username@company.com'
match = pattern.search(email)
match.group('name')
# output: 'username'
match.group('domain')
# output: 'company.com' 

Boolean

Match objects always have a boolean value of True if match, and None if no match. You can use if statement with the Match object

match = re.search('\d','I am 12 years old')
if match:
  print('Match found')
else:
  print('No match')
# output: Match found

Share on

Twitter Facebook LinkedIn

Joon Hee Jang

Guide to Python’s re Module

Getting Started with the re Module:

Metacharacters

Special Sequences

Functions

re.Match Objects

Share on

You may also enjoy

Setting Up Pi-hole on Ubuntu with AT&T BGW320 Router

Seaborn Data Visualization

How to Run LLM Model Locally

Standardization of Datasets