5.1. Regexp Syntax

5.1.1. About

  • Also known as regexp

  • Also known as regex

  • Also known as re

5.1.2. Matching

  • \ - Escapes special characters (allows matching *, ?, etc)

Table 112. Regular Expression Pattern Matching

Syntax

Description

[a-z]

One small letter form a to z

[A-Z]

One capital letter form A to Z

[0-9]

One digit from 0 to 9

[a-zA-Z0-9]

One of the following: small or capital letter or digit

[abc]

One of the following: a, b or c

A|B

One of either A or B patterns

5.1.3. Negation

Table 113. Regular Expression Pattern Negation

Syntax

Description

[^abc]

None of the following: a, b or c

^(?!.*word).*$

Not containing word

5.1.4. Unicode

  • \w - Includes most characters that can be part of a word in any language, as well as numbers and the underscore

Table 114. Regular Expression Patterns

Syntax

Description

\w

Unicode word character

\d

Unicode decimal digit [0-9], and many other digit characters

\s

Unicode whitespace characters [\t\n\r\f\v] and non-breaking spaces

5.1.5. Qualifiers

Table 115. Regular Expression Qualifiers

Syntax

Description

.

Any character except a newline

^

Start of the string

$

End of the string

*

Zero or more repetitions of the preceding pattern (as many as possible)

+

One or more repetitions of the preceding pattern

?

Zero or one repetitions of the preceding pattern

5.1.6. Quantifiers

Table 116. Regular Expression Quantifiers

Syntax

Description

{m}

Exactly m copies of the previous RE should be matched

{m,}

At least m repetitions

{,n}

At most n repetitions

{m,n}

Match from m to n repetitions of the preceding RE (as many as possible)

{m,n}?

Match from m to n repetitions of the preceding RE (as few as possible)

5.1.7. Non-Greedy

  • Adding ? after the qualifier makes it non-greedy

  • Non-greedy - as few as possible

  • Greedy - as many as possible

Table 117. Regular Expression Greedy and Non-Greedy Qualifiers

Syntax

Description

?

zero or one (greedy)

*

zero or more (greedy)

+

one or more (greedy)

??

zero or one (non greedy)

*?

zero or more (non greedy)

+?

one or more (non greedy)

5.1.8. Flags

Table 118. Regular Expression Flags

Flag

Description

re.IGNORECASE

Case-insensitive (Unicode support i.e. Ü and ü)

re.MULTILINE

^ matches beginning of the string and each line

re.MULTILINE

$ matches end of the string and each line

re.DOTALL

. matches newlines

5.1.9. Multiline

  • re.MULTILINE - Flag turns on Multiline search

  • ^ - Matches the start of the string, and immediately after each newline

  • $ - Matches the end of the string or just before the newline at the end of the string also matches before a newline

5.1.10. Groups

  • (?P<name>...)- Define named group

  • (?P=name)- Backreferencing by group name

  • \number - Backreferencing by group number

Table 119. Regular Expression Groups

Syntax

Description

(...)

Matches whatever regular expression is inside the parentheses, and indicates the start and end of a group

(?P<name>...)

substring matched by the group is accessible via the symbolic group name name

(?P=name)

A backreference to a named group

\number

Matches the contents of the group of the same number

Example:

  • (?P<tag><.*?>)text(?P=tag)

  • (?P<tag><.*?>)text\1

  • (.+) \1 matches the the or 55 55

  • (.+) \1 not matches thethe (note the space after the group)

5.1.11. Examples

  • r'^[a-zA-Z0-9][\w.+-]*@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]{2,20}$'

5.1.12. Visualization

../../_images/regexp-vizualization.png

Figure 211. Visualization for pattern r'^[a-zA-Z0-9][\w.+-]*@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]{2,20}$'