Regex Cheat Sheet: Complete Regular Expression Reference

Regular expressions (regex) are one of the most powerful tools in a developer's toolkit. Whether you're validating user input, parsing log files, or transforming text, regex patterns save you time and lines of code. This cheat sheet covers the most commonly used patterns, syntax elements, and practical examples you'll encounter in everyday development.

For interactive regex testing, try our free Online Regex Tester tool.

Common Regex Patterns

Email Validation

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

This pattern matches most standard email formats. It allows letters, numbers, dots, underscores, percent signs, plus signs, and hyphens before the @ symbol, followed by a domain name and a top-level domain of at least 2 characters. While no regex can perfectly validate email addresses per RFC 5322, this pattern covers 99% of real-world cases.

URL Matching

https?:\/\/(www\.)?[a-zA-Z0-9-]+\.[a-zA-Z]{2,}(\.[a-zA-Z]{2,})?(\/\S*)?

This pattern matches HTTP and HTTPS URLs with optional www prefix. It handles domains, subdomains, and optional path segments. For more strict URL validation, you might want to add port number and query string matching.

IPv4 Address

\b(?:\d{1,3}\.){3}\d{1,3}\b

This basic pattern matches IPv4 addresses in the format 0.0.0.0 to 999.999.999.999. For strict validation ensuring each octet is 0-255, use:

\b(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\b

IPv6 Address

([0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}

This handles full IPv6 addresses. Real-world implementations also need to account for shorthand notation with :: for consecutive zero groups.

Date Formats

\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])

Matches dates in YYYY-MM-DD format (ISO 8601). This validates month ranges 01-12 and day ranges 01-31. For additional formats like MM/DD/YYYY or DD-MM-YYYY, adjust the grouping order accordingly.

Phone Numbers (International)

\+?[\d\s\-()]{7,15}

A flexible pattern that matches most international phone number formats including +1 (555) 123-4567, +44 20 7946 0958, and variations with spaces, hyphens, and parentheses. For country-specific validation, use more targeted patterns.

US Phone Number

\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}

Matches US phone numbers in formats like (555) 123-4567, 555.123.4567, or 5551234567.

Chinese Mobile Number

1[3-9]\d{9}

Matches Chinese mobile numbers starting with 1 followed by a digit 3-9 and nine more digits. This covers all major Chinese carriers including China Mobile, China Unicom, and China Telecom.

Strong Password

^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$

This pattern enforces at least 8 characters with at least one lowercase letter, one uppercase letter, one digit, and one special character. The lookaheads ensure each character class is present without consuming characters.

Hex Color Code

#?([0-9a-fA-F]{3}){1,2}\b

Matches hex color codes in both 3-character shorthand (#fff) and 6-character full format (#ffffff), with or without the # prefix.

Social Security Number (US)

\d{3}-\d{2}-\d{4}

Matches the standard SSN format XXX-XX-XXXX. Note: for actual SSN validation, you'd also need to exclude specific invalid ranges like 000, 666, and 900-999 in the first group.

HTML Tag Matching

<([a-z][a-z0-9]*)\b[^>]*>(.*?)<\/\1>

Matches opening and closing HTML tags with content. The backreference \1 ensures the closing tag matches the opening tag name. This is useful for simple HTML parsing, though a proper parser is recommended for complex documents.

Regex Syntax Reference

SymbolDescriptionExample
.Any single character (except newline)a.c matches "abc", "a1c"
\dAny digit [0-9]\d{3} matches "123"
\DNon-digit\D+ matches "abc"
\wWord character [a-zA-Z0-9_]\w+ matches "hello_1"
\WNon-word character\W matches "@"
\sWhitespace charactera\sb matches "a b"
\SNon-whitespace\S+ matches "hello"
^Start of string/line^Hello matches "Hello" at start
$End of string/lineworld$ matches "world" at end
*Zero or moreab*c matches "ac", "abc", "abbc"
+One or moreab+c matches "abc", "abbc"
?Zero or one (optional)colou?r matches "color", "colour"
{n}Exactly n times\d{4} matches "2026"
{n,m}Between n and m times\d{2,4} matches "12", "123", "1234"
[abc]Character set[aeiou] matches any vowel
[^abc]Negated character set[^0-9] matches any non-digit
(a|b)Alternation (OR)cat|dog matches "cat" or "dog"
(...)Capturing group(\d{3})- captures area code
(?:...)Non-capturing group(?:ab)+ groups without capturing
\bWord boundary\bcat\b matches "cat" not "catalog"
\1, \2Backreferences(\w+)\s\1 matches repeated words
(?=...)Positive lookahead\d(?=px) digit before "px"
(?!...)Negative lookahead\d(?!px) digit not before "px"

Flags and Modifiers

FlagDescription
gGlobal — find all matches, not just the first
iCase-insensitive matching
mMultiline — ^ and $ match line boundaries
sDotall — . matches newline characters
uUnicode — enables full Unicode matching
xExtended — allows whitespace and comments in pattern

Practical Tips

Use Raw Strings in Python

Always use raw strings (r-prefix) when writing regex in Python to avoid escape sequence issues:

import re
# Correct
pattern = r'\d{4}-\d{2}-\d{2}'
# Wrong — \d interpreted as escape
pattern = '\d{4}-\d{2}-\d{2}'

Avoid Catastrophic Backtracking

Nested quantifiers like (a+)+ can cause exponential backtracking on certain inputs. Use possessive quantifiers or atomic groups where available, or restructure your pattern to avoid ambiguous matching paths.

Test Incrementally

Build complex regex patterns piece by piece, testing each component as you add it. Use our online regex tester to experiment with patterns in real time and see match highlighting instantly.

Comment Your Regex

For complex patterns, use the verbose/extended mode to add comments and whitespace:

pattern = r"""
    ^              # start of string
    \d{4}          # 4-digit year
    -              # separator
    (0[1-9]|1[0-2]) # month 01-12
    -              # separator
    (0[1-9]|[12]\d|3[01]) # day 01-31
    $              # end of string
"""

Regex in Different Languages

While regex syntax is largely consistent across languages, there are subtle differences:

FAQ

What's the difference between greedy and lazy quantifiers?

Greedy quantifiers (*, +, {n,m}) match as much as possible, while lazy quantifiers (*?, +?, {n,m}?) match as little as possible. For example, <.*> matches everything between the first < and last >, while <.*?> matches the shortest possible tag. Use lazy quantifiers when you want minimal matching, which is often the case in HTML/XML parsing.

Can regex parse HTML or JSON?

Technically yes for simple cases, but it's generally a bad idea. HTML and JSON have nested structures that regular expressions can't fully handle. Use proper parsers instead — like DOMParser for HTML or JSON.parse() for JSON. Regex is great for extracting patterns from structured text, but not for understanding hierarchical data.

How do I match a literal dot, asterisk, or other special character?

Escape special characters with a backslash: \. for a literal dot, \* for a literal asterisk, \? for a literal question mark, etc. The special characters that need escaping are: . * + ? ^ $ { } [ ] \ | ( ). In character classes [...], only \ ] ^ - need escaping, and the rules differ slightly.

What are named capture groups and should I use them?

Named capture groups use the syntax (?<name>...) or (?'name'...) depending on the language. Instead of referencing groups by number (\1, \2), you use names like \k<name>. Named groups make complex patterns much more readable and maintainable, especially when you have many groups or need to reorder them.

Why does my regex work in one language but not another?

Regex engines differ in features. Some common gotchas: JavaScript doesn't support lookbehind (pre-ES2018) or Unicode property escapes. Python's re module doesn't support atomic groups. Go's RE2 intentionally lacks backreferences for performance. Always check the specific engine's documentation for supported features.

How do I make regex case-insensitive?

Most engines support the i flag for case-insensitive matching. In JavaScript: /hello/i. In Python: re.compile(r'hello', re.IGNORECASE). You can also use character classes like [hH][eE][lL][lL][oO], but the flag approach is much cleaner.

What's catastrophic backtracking and how do I avoid it?

Catastrophic backtracking happens when a regex engine explores exponentially many paths for a failed match. It's typically caused by nested quantifiers like (a+)+ or overlapping alternatives. To avoid it: use atomic groups, possessive quantifiers, avoid nested quantifiers, and test with adversarial inputs. Go's RE2 engine avoids this entirely by using finite automata instead of backtracking.

How do I match Unicode characters in regex?

Use the u flag (JavaScript) or Unicode-aware mode. Unicode property escapes like \p{L} (any letter), \p{N} (any number), and \p{Script=Han} (Chinese characters) are extremely useful. In Python, use re.UNICODE flag and the \w, \d classes will match Unicode characters by default in Python 3.