5.1. Regexp Syntax

5.1.1. About

  • Also known as regexp

  • Also known as regex

  • Also known as re

5.1.2. Matching

  • \ - Escapes special characters (allows matching *, ?, etc)

Table 5.3. Regular Expression Pattern Matching




One small letter form a to z


One capital letter form A to Z


One digit from 0 to 9


One of the following: small or capital letter or digit


One of the following: a, b or c


One of either A or B patterns

5.1.3. Negation

Table 5.4. Regular Expression Pattern Negation




None of the following: a, b or c


Not containing word

5.1.4. Unicode

  • \w - Includes most characters that can be part of a word in any language, as well as numbers and the underscore

Table 5.5. Regular Expression Patterns




Unicode word character


Unicode decimal digit [0-9], and many other digit characters


Unicode whitespace characters [\t\n\r\f\v] and non-breaking spaces

5.1.5. Qualifiers

Table 5.6. Regular Expression Qualifiers




Any character except a newline


Start of the string


End of the string


Zero or more repetitions of the preceding pattern (as many as possible)


One or more repetitions of the preceding pattern


Zero or one repetitions of the preceding pattern

5.1.6. Quantifiers

Table 5.7. Regular Expression Quantifiers




Exactly m copies of the previous RE should be matched


At least m repetitions


At most n repetitions


Match from m to n repetitions of the preceding RE (as many as possible)


Match from m to n repetitions of the preceding RE (as few as possible)

5.1.7. Non-Greedy

  • Adding ? after the qualifier makes it non-greedy

  • Non-greedy - as few as possible

  • Greedy - as many as possible

Table 5.8. Regular Expression Greedy and Non-Greedy Qualifiers




zero or one (greedy)


zero or more (greedy)


one or more (greedy)


zero or one (non greedy)


zero or more (non greedy)


one or more (non greedy)

5.1.8. Flags

Table 5.9. Regular Expression Flags




Case-insensitive (Unicode support i.e. Ü and ü)


^ matches beginning of the string and each line


$ matches end of the string and each line


. matches newlines

5.1.9. Multiline

  • re.MULTILINE - Flag turns on Multiline search

  • ^ - Matches the start of the string, and immediately after each newline

  • $ - Matches the end of the string or just before the newline at the end of the string also matches before a newline

5.1.10. Groups

  • (?P<name>...)- Define named group

  • (?P=name)- Backreferencing by group name

  • \number - Backreferencing by group number

Table 5.10. Regular Expression Groups




Matches whatever regular expression is inside the parentheses, and indicates the start and end of a group


substring matched by the group is accessible via the symbolic group name name


A backreference to a named group


Matches the contents of the group of the same number


  • (?P<tag><.*?>)text(?P=tag)

  • (?P<tag><.*?>)text\1

  • (.+) \1 matches the the or 55 55

  • (.+) \1 not matches thethe (note the space after the group)

5.1.11. Examples

  • r'^[a-zA-Z0-9][\w.+-]*@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]{2,20}$'

5.1.12. Visualization


Figure 5.1. Visualization for pattern r'^[a-zA-Z0-9][\w.+-]*@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]{2,20}$'