Regular Expressions
Regular expressions are a particular kind of formal grammar used to parse strings and other textual information that are known as Regular Languages in formal language theory.
Index
Syntax
| Delimiters | |
|---|---|
^ |
Start. |
$ |
End. |
/x/ |
Start and end. |
| Character Classes | |
|---|---|
\d |
Digit. |
\w |
Word. |
\s |
Whitespace, tab or line break characters. |
\D ^\d |
Not digit. |
\W ^\w |
Not word. |
\S ^\s |
Not whitespace, tab or line break characters. |
. |
Any character. |
| Composites | |
|---|---|
xy |
x followed by y. |
x|y |
x or y (prefer x). |
| Quantifiers | |
|---|---|
x? |
0 or 1 (prefer 1). |
x+ |
1 or more (prefer more). |
x* |
0 or more (prefer more). |
x?? |
0 or 1 (prefer 0). |
x+? |
1 or more (prefer fewer). |
x*? |
0 or more (prefer fewer). |
x{n} |
n. |
x{n,m} |
n to m (prefer more). |
x{n,} |
n or more (prefer more). |
x{n}? |
n. |
x{n,m}? |
n to m (prefer fewer). |
x{n,}? |
n or more (prefer fewer). |
| Ranges | |
|---|---|
a-z |
Letter in range a-z. |
A-Z |
Letter in range A-Z. |
0-9 |
Digit in range 0-9. |
| Flags | |
|---|---|
/x/g |
Global, find all matches rather than stopping after the first match. |
/x/m |
Multi line, ^ and $ match the start and end of a line. |
/x/i |
Case insensitive. |
Characters ^.[$()|*+? can be used with a backslash \.
| Unicode | |
|---|---|
/^$/u |
Required to enable unicode mode. |
\p{L} |
Unicode letter. |
\p{N} |
Unicode number. |
\p{M} |
Unicode mark. |
Examples
Begins with "The".
^The
Ends with "end".
end$
ab followed by zero or one c.
abc?
ab followed by one or more c.
abc+
ab followed by zero or more c.
abc*
ab followed by two c.
abc{2}
ab followed by two up to five c.
abc{2,5}
ab followed by two or more c.
abc{2,}
a followed by b or c.
a(b|c)
a followed by zero or more bc.
a(bc)*
a followed by two up to five bc.
a(bc){2,5}
A string that doesn't contain characters from a to z or from A to Z.
[^a-zA-Z]
A string that doesn't contain the character -.
[^-]
A string that doesn't contain the characters <>.
[^<>]
A string that can be empty.
^(^$)|(<REGEXP>)$
Alphanumeric string (e.g. nicknames).
^[a-zA-Z]+[a-zA-Z0-9]*$
Alpha strings separated by spaces (e.g. names or surnames).
^[a-zA-Z]+( [a-zA-Z]+)*$
Alphanumeric strings separated by spaces (e.g. Route 66).
^[a-zA-Z]+[a-zA-Z0-9]*( [a-zA-Z0-9]+)*$
Alphanumeric unicode strings separated by spaces (e.g. 号公路 66).
^\p{L}+[\p{L}\p{N}]*( [\p{L}\p{N}]+)*$
\p{L}→ Matches any unicode letter.\p{N}→ Matches any unicode number.[ \p{L}\p{N}]→ Allows spaces between words while ensuring each word consists of letters and numbers.
Email addresses.
^[^\s@]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
[^\s@]+→ Ensures it starts with the local part (any unicode character except spaces and @).@→ Ensures there is exactly one @ symbol.[a-zA-Z0-9.-]+→ Matches the domain part (letters, numbers, dots and hyphens).\.→ Ensures there is at least one dot before the TLD.[a-zA-Z]{2,}→ Ensures a valid TLD (e.g., .com, .net, .org, .ai) with at least 2 letters.
Telephone numbers.
^\+\d{1,4}\d{6,14}$
\+→ Ensures it starts with a +.\d{1,4}→ Matches the country code (1 to 4 digits, e.g., +1, +44, +123).\d{6,14}→ Ensures the rest has only digits (between 6 and 14 digits, which covers most phone numbers).
File names.
/^[\p{L}\p{N}_\-\s]+\.(jpg|png)$/iu
[\p{L}\p{N}_\-\s]+→ Matches names that contain unicode letters, numbers, underscores, hyphens, and spaces.\.→ Matches the dot before the extension.(jpg|png)→ Ensures it ends with jpg or png.