Mastering Regular Expressions - Complete Guide to Regex Patterns
Learn everything about regular expressions, how they work, common patterns, and best practices for pattern matching, validation, and text processing.
What are Regular Expressions?
Regular expressions (regex) are powerful pattern-matching tools that allow you to search, match, and manipulate text based on complex rules. They provide a concise way to describe patterns in strings and are supported by most programming languages.
Why Use Regular Expressions?
Regular expressions are useful for:
- Text Validation: Email addresses, phone numbers, URLs
- Search and Replace: Finding and replacing patterns in text
- Data Extraction: Parsing and extracting information from strings
- Text Processing: Formatting and cleaning data
- Pattern Matching: Complex string matching requirements
Basic Regex Syntax
Literal Characters
Most characters match themselves:
hello
Matches: "hello"
Special Characters (Metacharacters)
These characters have special meaning and must be escaped with \:
. ^ $ * + ? { } [ ] \ | ( )
Character Classes
Character Set [...] - Matches any single character inside brackets:
[aeiou] # Matches any vowel
[0-9] # Matches any digit
[a-z] # Matches any lowercase letter
[A-Za-z0-9] # Matches alphanumeric characters
[^aeiou] # Matches anything except vowels
Predefined Classes:
\d # Digit [0-9]
\w # Word character [A-Za-z0-9_]
\s # Whitespace [ \t\n\r\f\v]
\D # Non-digit [^0-9]
\W # Non-word character
\S # Non-whitespace
Quantifiers
Control how many times a pattern should match:
* # Zero or more (greedy)
+ # One or more (greedy)
? # Zero or one (optional)
{n} # Exactly n times
{n,} # n or more times
{n,m} # Between n and m times
*? # Zero or more (lazy/non-greedy)
+? # One or more (lazy/non-greedy)
Examples
a* # Matches "", "a", "aa", "aaa", ...
a+ # Matches "a", "aa", "aaa", ... (not "")
a? # Matches "" or "a"
a{3} # Matches exactly "aaa"
a{3,5} # Matches "aaa", "aaaa", or "aaaaa"
Anchors
Specify where a pattern should match:
^ # Start of string
$ # End of string
\b # Word boundary
\B # Non-word boundary
Examples
^hello # Matches "hello" only at start
world$ # Matches "world" only at end
^hello world$ # Matches entire string exactly
\bword\b # Matches "word" as whole word
Groups and Capturing
Capturing Groups (...)
Capture and extract matched content:
(\d{3})-(\d{3})-(\d{4})
Matches phone number and captures area code, exchange, and number separately
Non-Capturing Groups (?:...)
Group without capturing:
(?:Mr|Mrs|Ms)\.\s(\w+)
Groups title options but only captures the name
Named Groups (?<name>...)
Capture with a name:
(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})
Alternation
Match one of several patterns:
cat|dog # Matches "cat" or "dog"
(Mon|Tue|Wed) # Matches any weekday
Flags
Modify regex behavior:
g # Global (find all matches)
i # Case-insensitive
m # Multiline (^ and $ match line boundaries)
s # Dotall (. matches newline)
u # Unicode
Common Regex Patterns
Email Validation
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
Breakdown:
^[a-zA-Z0-9._%+-]+- Username part@- Literal @[a-zA-Z0-9.-]+- Domain name\.- Literal dot (escaped)[a-zA-Z]{2,}$- Top-level domain (2+ letters)
Phone Number (US Format)
^(\+?1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$
Matches formats like:
(555) 123-4567555-123-4567555.123.4567+1 555 123 4567
URL Pattern
https?://(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)
Date Format (YYYY-MM-DD)
^\d{4}-\d{2}-\d{2}$
Strong Password
^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$
Requirements:
- At least one lowercase letter
- At least one uppercase letter
- At least one digit
- At least one special character
- Minimum 8 characters
Hex Color Code
^#([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})$
Matches: #FF0000, #f00
Credit Card Number
^\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}$
Matches formats like:
1234 5678 9012 34561234-5678-9012-34561234567890123456
Lookahead and Lookbehind
Positive Lookahead (?=...)
Match pattern followed by another pattern:
\d+(?= dollars)
Matches digits only if followed by " dollars"
Negative Lookahead (?!...)
Match pattern NOT followed by another pattern:
\d+(?! dollars)
Matches digits NOT followed by " dollars"
Lookbehind (?<=...) / (?<!...)
Similar to lookahead but checks what comes before:
(?<=\$)\d+ # Digits preceded by $
(?<!\$)\d+ # Digits NOT preceded by $
Unicode and Flags
- Use the
uflag for proper Unicode handling - Be careful when matching emojis or surrogate pairs; prefer
\p{Emoji}classes where supported - Normalize text (NFC) when comparing composed/accented characters
Best Practices
Performance Tips
-
Be Specific: More specific patterns are faster
# Faster ^\d{3}-\d{3}-\d{4}$ # Slower .*.*.*.*.*.*.*.*.*.*.* -
Avoid Catastrophic Backtracking:
# Dangerous (can cause performance issues) (a+)+b # Better a+b -
Use Anchors: Start/end anchors improve performance
^pattern$ # Faster pattern # Slower
Readability Tips
-
Add Comments: Use
(?#...)for inline comments^\d{4}(?#year)-\d{2}(?#month)-\d{2}(?#day)$ -
Use Non-Capturing Groups: When you don't need to capture
(?:cat|dog) # Better than (cat|dog) if not capturing -
Break Complex Patterns: Split into multiple regex patterns when possible
Validation Tips
-
Anchors for Exact Matching: Always use
^and$for validation^email@domain\.com$ # Exact match email@domain\.com # Partial match (can match "my email@domain.com is...") -
Test Edge Cases: Test with empty strings, special characters, unicode
-
Don't Over-Validate: Sometimes simpler patterns are better than complex ones
Common Mistakes
1. Forgetting Anchors
# Wrong - matches "email@domain.com" anywhere
\d{3}-\d{3}-\d{4}
# Correct - matches entire string
^\d{3}-\d{3}-\d{4}$
2. Not Escaping Special Characters
# Wrong - . matches any character
\d+.\d+
# Correct - \. matches literal dot
^\d+\.\d+$
3. Greedy vs Lazy Quantifiers
# Greedy - matches as much as possible
<.*> # Matches entire "<tag>content</tag>"
# Lazy - matches as little as possible
<.*?> # Matches "<tag>" separately
Real-World Examples
Extracting Data from Text
const text = "Contact: john@example.com or call 555-123-4567";
const email = text.match(/[\w.]+@[\w.]+\.\w+/)[0];
const phone = text.match(/\d{3}-\d{3}-\d{4}/)[0];
Replacing Patterns
const text = "Hello, my email is user@example.com";
const masked = text.replace(/[\w.]+@[\w.]+\.\w+/, '***@***.***');
// Result: "Hello, my email is ***@***.***"
Splitting Text
const csv = "name,email,phone";
const fields = csv.split(/,/);
Conclusion
Regular expressions are a powerful tool for text processing, validation, and pattern matching. While they can seem complex at first, understanding the fundamental concepts—character classes, quantifiers, anchors, and groups—will enable you to create effective patterns for a wide variety of use cases.
Practice with real-world scenarios, test your patterns thoroughly, and remember that sometimes multiple simple patterns are better than one complex pattern. With regular expressions, you'll be able to handle text processing tasks efficiently and elegantly.
Try Regex Tester Now
Ready to put this into practice? Use our free Regex Tester tool. It works entirely in your browser with no signup required.
Launch Regex TesterFrequently Asked Questions
Q Why does my regex seem slow?
Patterns with excessive backtracking (like nested quantifiers) can cause performance issues; simplify and anchor your regex.
Related Articles
Understanding JSON Formatting and Validation
A comprehensive guide to JSON format, why it matters, and how to format and validate JSON effectively for better code readability and debugging.
Understanding URL Encoding - Complete Guide to Percent Encoding
Learn everything about URL encoding (percent-encoding), how it works, when to use it, and best practices for encoding URLs and URI components safely.
JSONPath Expressions - Query JSON Like a Pro
Master JSONPath expressions to query and extract data from JSON documents efficiently. Learn syntax, operators, and practical examples.
JWT Tokens Explained - Complete Guide to JSON Web Tokens
Learn everything about JWT tokens, how they work, when to use them, and best practices for secure implementation in your applications.