Regular expressions (regex) are the Swiss Army knife of text processing. From validating email addresses to extracting data from logs, parsing configuration files to building search-and-replace workflows, regex is an indispensable skill for every developer. Yet regex has a reputation for being cryptic and hard to master.
This guide provides a complete, practical reference for 2026. You'll learn the core syntax, explore the most useful patterns for real-world tasks, understand advanced constructs like lookaround, and discover how to write regex that's both correct and fast.
A regular expression is a sequence of characters that defines a search pattern. Regex engines (built into JavaScript, Python, Java, Go, Rust, and virtually every language) use these patterns to match, search, replace, and split text.
The power of regex comes from its ability to describe not just literal characters but classes of characters, repetitions, alternatives, and positional constraints. A well-crafted regex can validate a 200-character input format in a single line of code.
| Pattern | Matches | Example |
|---|---|---|
. | Any character (except newline) | a.c → "abc", "a1c" |
\d | Digit [0-9] | \d{3} → "123" |
\D | Non-digit | \D+ → "abc" |
\w | Word character [a-zA-Z0-9_] | \w+ → "hello_1" |
\W | Non-word character | \W → "@", " " |
\s | Whitespace | \s+ → " ", "\t" |
\S | Non-whitespace | \S+ → "hello" |
[abc] | Any of a, b, or c | [aeiou] → "a" |
[^abc] | Not a, b, or c | [^0-9] → "a" |
[a-z] | Range a through z | [A-Za-z]+ → "Hello" |
Characters like ., *, +, ?, ^, $, |, \, (, ), [, ], {, } have special meaning in regex. Escape them with a backslash to match literally: \. matches a literal dot.
| Quantifier | Meaning | Example |
|---|---|---|
* | 0 or more | ab*c → "ac", "abc", "abbc" |
+ | 1 or more | ab+c → "abc", "abbc" |
? | 0 or 1 (optional) | colou?r → "color", "colour" |
{n} | Exactly n | \d{4} → "2026" |
{n,} | n or more | \d{2,} → "42", "1000" |
{n,m} | Between n and m | \d{1,3} → "42", "256" |
By default, quantifiers are greedy — they match as much as possible. Add ? after a quantifier to make it lazy (match as little as possible):
// Greedy: matches "hello world"
<em>.*</em>
// Lazy: matches "hello" then "world" separately
<em>.*?</em>
| Anchor | Meaning | Example |
|---|---|---|
^ | Start of string/line | ^Hello → "Hello" at start |
$ | End of string/line | world$ → "world" at end |
\b | Word boundary | \bcat\b → "cat" not "catalog" |
\B | Non-word boundary | \Bcat → "catalog" not "cat" |
| Syntax | Name | Description |
|---|---|---|
(abc) | Capturing group | Matches "abc" and captures for backreference |
(?:abc) | Non-capturing group | Groups without capturing (better performance) |
(?<name>abc) | Named group | Captures with a readable name |
\1 | Backreference | References the first captured group |
(?=abc) | Positive lookahead | Matches if followed by "abc" |
(?!abc) | Negative lookahead | Matches if NOT followed by "abc" |
(?<=abc) | Positive lookbehind | Matches if preceded by "abc" |
(?<!abc) | Negative lookbehind | Matches if NOT preceded by "abc" |
Lookaround assertions let you match a position based on what's ahead or behind — without including that text in the match result.
Password validation — must contain a letter and a digit:
^(?=.*[A-Za-z])(?=.*\d)[A-Za-z\d]{8,}$
Match "price" only before a dollar amount:
price(?=\s*\$)
Match numbers not preceded by a minus sign:
(?<!-)\d+
| Pattern | Regex |
|---|---|
| Email (practical) | ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$ |
| URL | https?://[^\s/$.?#].[^\s]* |
| IPv4 | ^((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)$ |
| Phone (US) | ^\+?1?\s*\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$ |
| Date (YYYY-MM-DD) | ^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$ |
| Hex color | ^#?([0-9a-fA-F]{3}){1,2}$ |
| Strong password | ^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&]).{8,}$ |
| Pattern | Regex |
|---|---|
| HTML tags | <([^>]+)>[^<]*</\1> |
| Numbers in text | -?\d+\.?\d* |
| CSV fields | "[^"]*"|[^,]+ |
| JSON keys | "([^"]+)"\s*: |
| Social Security | \d{3}-\d{2}-\d{4} |
| Credit card | \d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4} |
Regex performance matters when processing large inputs or running in hot code paths. A poorly written regex can cause catastrophic backtracking — exponential time complexity that freezes your application.
// ❌ Dangerous: can cause catastrophic backtracking
^(a+)+$
// ✅ Safer: use possessive quantifiers or atomic groups
^(?:a++)+$
// ❌ Slower: alternation checks each branch
[0-9a-zA-Z]
// ✅ Faster: single character class
[0-9a-zA-Z]
// Actually these are equivalent, but:
// ❌ Slower
cat|dog|car
// ✅ Faster (shared prefix)
c(?:at|ar)|dog
Capturing groups store match data in memory. If you don't need backreferences, use (?:...) instead of (...). This is especially important in patterns with many groups.
Adding ^ and $ anchors tells the engine it can skip positions where the pattern can't possibly match, enabling faster failure.
Most modern regex engines support timeout parameters. Always set one for user-supplied patterns:
// JavaScript (2026+: RegExp.timeout)
const regex = new RegExp(userInput, { timeout: 1000 });
// Python
import re
match = re.search(pattern, text, timeout=1.0)
Writing regex by trial and error in production code is painful. An online regex tester gives you instant feedback, highlights matches in your test string, and explains what each part of your pattern does.
Key features to look for in a regex tester:
g (global), i (case-insensitive), m (multiline), s (dotall)Real-time regex testing with match highlighting, group extraction, replace mode, and pattern explanation — completely free.
Open Regex Tester →Regular expressions are worth the investment to learn. Start with character classes, quantifiers, and anchors. Then add groups, backreferences, and lookaround as needed. Always test your patterns with edge cases, prefer non-capturing groups for performance, and set timeouts for user-supplied input. With a good regex tester and the patterns in this guide, you can handle virtually any text processing challenge that comes your way.