Regular expressions look like line noise. But behind the cryptic syntax lies the most powerful text processing tool available to developers. Every major programming language supports regex, every text editor uses it for search-and-replace, and every data pipeline relies on it for validation and extraction.
This guide doesn't teach regex from first principles. Instead, it walks through 10 real-world case studies โ each one a pattern you'll actually use in production, complete with the regex, explanation, test cases, and common pitfalls.
๐งช Test every pattern below interactively โ paste, match, and debug in real time.
Open Regex TesterThe most common regex use case. The challenge: email addresses have a complex spec (RFC 5322), but you need a practical pattern that catches errors without rejecting valid addresses.
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
How it works: The local part allows letters, digits, dots, underscores, percent signs, pluses, and hyphens. The domain part allows letters, digits, dots, and hyphens. The TLD must be at least 2 characters.
Matches: user@example.com, john.doe+tag@company.co.uk, admin@sub.domain.org
Rejects: user@ (no domain), @example.com (no local part), user@.com (no domain name)
Gotcha: This pattern rejects technically valid but unusual addresses like " "@example.org. For most applications, this trade-off is acceptable. Use an HTML5 <input type="email"> for client-side validation as a complement.
Extracting phone numbers from free text is harder than it looks. People write them in dozens of formats: with parentheses, dashes, dots, spaces, country codes, and extensions.
(?:\+?(\d{1,3}))?[-. (]*(\d{3})[-. )]*(\d{3})[-. ]*(\d{4})(?: *x(\d+))?
How it works: Optional country code with +, followed by area code and number in flexible formats. The extension is captured separately.
Matches: +1 (555) 123-4567, 555.123.4567, 555-123-4567 x89, +44 20 7946 0958
Tip: For production use, consider Google's libphonenumber library, which handles every country's numbering plan correctly.
Given a block of text, extract all URLs โ whether they use http, https, or have no protocol at all.
https?:\/\/(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b(?:[-a-zA-Z0-9()@:%_\+.~#?&\/=]*)
How it works: Matches http or https, optional www., then the domain and path. The character class for the path includes query parameters and fragments.
Matches: https://example.com, http://www.example.com/path?q=1, https://api.example.co.uk/v2/users?page=1&limit=20
Gotcha: This pattern may match trailing punctuation as part of the URL (e.g., the period at the end of a sentence). Post-process matches to strip trailing dots, commas, and parentheses.
Validate ISO 8601 dates. The regex catches format errors; application logic should handle semantic errors (like February 30).
^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$
How it works: Year is exactly 4 digits. Month is 01-12. Day is 01-31. This accepts technically invalid dates like 2026-02-30, which your application should reject separately.
Matches: 2026-04-13, 1999-12-31, 2000-01-01
Rejects: 2026-13-01 (invalid month), 2026-4-1 (not zero-padded), 26-04-13 (2-digit year)
Validate that a password meets complexity requirements: at least 8 characters, with uppercase, lowercase, digit, and special character.
^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$
How it works: Each (?=.*[X]) is a lookahead that ensures at least one character of the specified type exists somewhere in the string. The final part matches the actual characters.
Matches: Passw0rd!, My$ecure1, Complexity#9
Rejects: password (no upper, digit, special), Passw0rd (no special), Pw1! (too short)
Tip: For better UX, check each requirement separately and show users which ones they've met, rather than one monolithic pass/fail.
Extract IPv4 addresses from logs, configuration files, or network data.
\b(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\b
How it works: Each octet is 0-255. The pattern uses alternation to handle the 200-255 range, 100-199 range, and 0-99 range separately.
Matches: 192.168.1.1, 10.0.0.255, 255.255.255.0
Rejects: 256.1.1.1 (octet > 255), 1.2.3 (only 3 octets)
Strip all HTML tags from a string, leaving only the text content. Useful for sanitizing user input or creating plain-text versions of HTML content.
<[^>]*>
How it works: Matches anything between < and >. Replace with empty string.
Input: <p>Hello <strong>world</strong>!</p>
Output: Hello world!
Warning: This doesn't handle <script> content correctly (the JS code between tags won't be removed) and fails on malformed HTML. For robust HTML stripping, use a DOM parser.
Extract ISO 8601 timestamps from server logs for analysis.
\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?(?:Z|[+-]\d{2}:?\d{2})
How it works: Matches the date, T separator, time, optional milliseconds, and timezone (Z or offset).
Matches: 2026-04-13T06:30:00Z, 2026-04-13T14:30:00.123+08:00, 2026-04-13T06:30:00-05:00
Validate CSS hex color codes in 3-digit or 6-digit format, with or without the # prefix.
#?([0-9a-fA-F]{3}){1,2}\b
How it works: Optional #, then either 3 hex digits (shorthand) or 6 hex digits (full). The word boundary prevents matching partial hex codes.
Matches: #fff, #336699, ABCDEF
Rejects: #ggg (invalid hex), #1234 (neither 3 nor 6 digits)
Replace all but the last 4 digits of a credit card number with asterisks for PCI-DSS compliance.
\b(\d{4})[- ]?(\d{4})[- ]?(\d{4})[- ]?(\d{4})\b
Replacement: ****-****-****-$4
Input: Card: 4111-1111-1111-1234
Output: Card: ****-****-****-1234
How it works: Each group captures 4 digits with optional separators. The replacement references only the last group ($4) and hardcodes asterisks for the rest.
| Pattern | Regex |
|---|---|
| Username (3-16 chars) | ^[a-zA-Z0-9_]{3,16}$ |
| Slug / URL-safe string | ^[a-z0-9]+(?:-[a-z0-9]+)*$ |
| Hexadecimal number | ^0x[0-9a-fA-F]+$ |
| Semantic version | ^\d+\.\d+\.\d+(?:-[\w.]+)?$ |
| UUID v4 | ^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$ |
| Strong password | ^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,}$ |
| MAC address | ^([0-9a-fA-F]{2}:){5}[0-9a-fA-F]{2}$ |
\d+ works but \d{3}-\d{4} doesn't, you know the issue is with the separator..* matches as much as possible. Use .*? (lazy) to match as little as possible. This is the #1 source of unexpected regex behavior.. * + ? ^ $ { } [ ] \ | ( ) have special meaning in regex. To match them literally, escape with \.๐ง Stop guessing โ test your regex patterns live with instant match highlighting.
Test Your Regex Now