11.1. Regexp Syntax

11.1.1. About

  • Also known as regexp

  • Also known as regex

  • Also known as re

11.1.2. Matching

  • \ - Escapes special characters (allows matching *, ?, etc)

Table 11.1. Regular Expression Pattern Matching




One small letter form a to z


One capital letter form A to Z


One digit from 0 to 9


One of the following: small or capital letter or digit


One of the following: a, b or c


One of either A or B patterns

11.1.3. Negation

Table 11.2. Regular Expression Pattern Negation




None of the following: a, b or c


Not containing word

11.1.4. Unicode

  • \w - Includes most characters that can be part of a word in any language, as well as numbers and the underscore

Table 11.3. Regular Expression Patterns




Unicode word character


Unicode decimal digit [0-9], and many other digit characters


Unicode whitespace characters [\t\n\r\f\v] and non-breaking spaces

11.1.5. Qualifiers

Table 11.4. Regular Expression Qualifiers




Any character except a newline


Start of the string


End of the string


Zero or more repetitions of the preceding pattern (as many as possible)


One or more repetitions of the preceding pattern


Zero or one repetitions of the preceding pattern

11.1.6. Quantifiers

Table 11.5. Regular Expression Quantifiers




Exactly m copies of the previous RE should be matched


At least m repetitions


At most n repetitions


Match from m to n repetitions of the preceding RE (as many as possible)


Match from m to n repetitions of the preceding RE (as few as possible)

11.1.7. Non-Greedy

  • Adding ? after the qualifier makes it non-greedy

  • Non-greedy - as few as possible

  • Greedy - as many as possible

Table 11.6. Regular Expression Greedy and Non-Greedy Qualifiers




zero or one (greedy)


zero or more (greedy)


one or more (greedy)


zero or one (non greedy)


zero or more (non greedy)


one or more (non greedy)

11.1.8. Flags

Table 11.7. Regular Expression Flags




Case-insensitive (Unicode support i.e. Ü and ü)


^ matches beginning of the string and each line


$ matches end of the string and each line


. matches newlines

11.1.9. Multiline

  • re.MULTILINE - Flag turns on Multiline search

  • ^ - Matches the start of the string, and immediately after each newline

  • $ - Matches the end of the string or just before the newline at the end of the string also matches before a newline

11.1.10. Groups

  • (?P<name>...)- Define named group

  • (?P=name)- Backreferencing by group name

  • \number - Backreferencing by group number

Table 11.8. Regular Expression Groups




Matches whatever regular expression is inside the parentheses, and indicates the start and end of a group


substring matched by the group is accessible via the symbolic group name name


A backreference to a named group


Matches the contents of the group of the same number


  • (?P<tag><.*?>)text(?P=tag)

  • (?P<tag><.*?>)text\1

  • (.+) \1 matches the the or 55 55

  • (.+) \1 not matches thethe (note the space after the group)

11.1.11. Examples

  • r'^[a-zA-Z0-9][\w.+-]*@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]{2,20}$'

11.1.12. Visualization


Figure 11.1. Visualization for pattern r'^[a-zA-Z0-9][\w.+-]*@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]{2,20}$'