Note: The raw string notation
r'...' is most useful
for regular expressions; see
raw strings,
above.
These characters have special meanings in regular expressions:
. | Matches any character except a newline. |
^ | Matches the start of the string. |
$ | Matches the end of the string. |
|
Matches zero or more repetitions of regular
expression .
|
|
Matches one or more repetitions of .
|
|
Matches zero or one .
|
|
Non-greedy form of ; matches as few
characters as possible. The normal *
operator is greedy: it matches as much text as
possible.
|
|
Non-greedy form of .
|
|
Non-greedy form of .
|
|
Matches from to repetitions of . For example, r'x{3,5}' matches between three and five
copies of letter 'x'; r'(bl){4}' matches the string 'blblblbl'.
|
| Non-greedy version of the previous form. |
[...]
|
Matches one character from a set of characters. You
can put all the allowable characters inside the
brackets, or use to mean all
characters from to inclusive. For example, regular
expression r'[abc]' will match either
'a', 'b', or 'c'. Pattern r'[0-9a-zA-Z]'
will match any single letter or digit.
|
[^...]
| Matches any character not in the given set. |
|
Matches expression followed by expression
.
|
|
Matches either or .
|
(
|
Matches and forms it into a group that can be retrieved
separately after a match; see MatchObject, below. Groups are numbered starting from
1.
|
(?:
|
Matches but does not form a group for later retrieval.
|
(?P<
|
Matches and forms it into a named group, with name , for later
retrieval.
|
(?P=
|
Matches whatever string matched an earlier
(?P<
group.
|
(?#...)
|
Comment: the “...”
portion is ignored and may contain a comment.
|
(?=...)
|
The “...” portion
must be matched, but is not consumed by the
match. This is sometimes called a lookahead
match. For example, r'a(?=bcd)'
matches the string 'abcd' but
not the string 'abcxyz'.
Compared to using r'abcd' as the
regular expression, the difference is that in
this case the matched portion would be 'a' and not 'abcd'.
|
(?!...)
|
This is similar to the (?=...):
it specifies a regular expression that must
not match, but does not
consume any characters. For example, r'a(?!bcd)' would match 'axyz', and return 'a'
as the matched portion; but it would not match
'abcdef'. You could call it
a negative lookahead match.
|
The special sequences in the table below are
recognized. However, many of them function in ways
that depend on the locale; see Section 19.4, “What is the locale?”. For example, the r'\s' sequence
matches characters that are considered whitespace in
the current locale.
\
|
Matches the same text as a group that matched earlier,
where is
the number of that group. For example, r'([a-zA-Z]+):\1' matches the string "foo:foo".
|
\A
| Matches only at the start of the string. |
\b
|
Matches the empty string but only at the start or end
of a word (where a word is set off by whitespace or a
non-alphanumeric character). For example, r'foo\b' would match "foo" but
not "foot".
|
\B
| Matches the empty string when not at the start or end of a word. |
\d
| Matches any digit. |
\D
| Matches any non-digit. |
\s
| Matches any whitespace character. |
\S
| Matches any non-whitespace character. |
\w
|
Matches any alphanumeric character plus the
underbar '_'.
|
\W
| Matches any non-alphanumeric character. |
\Z
| Matches only at the end of the string. |
\\
|
Matches a backslash (\) character.
|