# 5.1. Regexp Syntax

• Also known as regexp

• Also known as regex

• Also known as re

## 5.1.2. Matching

• \ - Escapes special characters (allows matching *, ?, etc)

Table 112. Regular Expression Pattern Matching

Syntax

Description

[a-z]

One small letter form a to z

[A-Z]

One capital letter form A to Z

[0-9]

One digit from 0 to 9

[a-zA-Z0-9]

One of the following: small or capital letter or digit

[abc]

One of the following: a, b or c

A|B

One of either A or B patterns

## 5.1.3. Negation

Table 113. Regular Expression Pattern Negation

Syntax

Description

[^abc]

None of the following: a, b or c

^(?!.*word).*$ Not containing word ## 5.1.4. Unicode • \w - Includes most characters that can be part of a word in any language, as well as numbers and the underscore Table 114. Regular Expression Patterns Syntax Description \w Unicode word character \d Unicode decimal digit [0-9], and many other digit characters \s Unicode whitespace characters [\t\n\r\f\v] and non-breaking spaces ## 5.1.5. Qualifiers Table 115. Regular Expression Qualifiers Syntax Description . Any character except a newline ^ Start of the string $

End of the string

*

Zero or more repetitions of the preceding pattern (as many as possible)

+

One or more repetitions of the preceding pattern

?

Zero or one repetitions of the preceding pattern

## 5.1.6. Quantifiers

Table 116. Regular Expression Quantifiers

Syntax

Description

{m}

Exactly m copies of the previous RE should be matched

{m,}

At least m repetitions

{,n}

At most n repetitions

{m,n}

Match from m to n repetitions of the preceding RE (as many as possible)

{m,n}?

Match from m to n repetitions of the preceding RE (as few as possible)

## 5.1.7. Non-Greedy

• Adding ? after the qualifier makes it non-greedy

• Non-greedy - as few as possible

• Greedy - as many as possible

Table 117. Regular Expression Greedy and Non-Greedy Qualifiers

Syntax

Description

?

zero or one (greedy)

*

zero or more (greedy)

+

one or more (greedy)

??

zero or one (non greedy)

*?

zero or more (non greedy)

+?

one or more (non greedy)

## 5.1.8. Flags

Table 118. Regular Expression Flags

Flag

Description

re.IGNORECASE

Case-insensitive (Unicode support i.e. Ü and ü)

re.MULTILINE

^ matches beginning of the string and each line

re.MULTILINE

## 5.1.10. Groups

• (?P<name>...)- Define named group

• (?P=name)- Backreferencing by group name

• \number - Backreferencing by group number

Table 119. Regular Expression Groups

Syntax

Description

(...)

Matches whatever regular expression is inside the parentheses, and indicates the start and end of a group

(?P<name>...)

substring matched by the group is accessible via the symbolic group name name

(?P=name)

A backreference to a named group

\number

Matches the contents of the group of the same number

Example:

• (?P<tag><.*?>)text(?P=tag)

• (?P<tag><.*?>)text\1

• (.+) \1 matches the the or 55 55

• (.+) \1 not matches thethe (note the space after the group)