Mastering Regular Expressions - Complete Guide to Regex Patterns

What are Regular Expressions?

Regular expressions (regex) are powerful pattern-matching tools that allow you to search, match, and manipulate text based on complex rules. They provide a concise way to describe patterns in strings and are supported by most programming languages.

Why Use Regular Expressions?

Regular expressions are useful for:

Text Validation: Email addresses, phone numbers, URLs
Search and Replace: Finding and replacing patterns in text
Data Extraction: Parsing and extracting information from strings
Text Processing: Formatting and cleaning data
Pattern Matching: Complex string matching requirements

Basic Regex Syntax

Literal Characters

Most characters match themselves:

hello

Matches: "hello"

Special Characters (Metacharacters)

These characters have special meaning and must be escaped with \:

. ^ $ * + ? { } [ ] \ | ( )

Character Classes

Character Set [...] - Matches any single character inside brackets:

[aeiou]        # Matches any vowel
[0-9]          # Matches any digit
[a-z]          # Matches any lowercase letter
[A-Za-z0-9]    # Matches alphanumeric characters
[^aeiou]       # Matches anything except vowels

Predefined Classes:

\d             # Digit [0-9]
\w             # Word character [A-Za-z0-9_]
\s             # Whitespace [ \t\n\r\f\v]
\D             # Non-digit [^0-9]
\W             # Non-word character
\S             # Non-whitespace

Quantifiers

Control how many times a pattern should match:

*              # Zero or more (greedy)
+              # One or more (greedy)
?              # Zero or one (optional)
{n}            # Exactly n times
{n,}           # n or more times
{n,m}          # Between n and m times
*?             # Zero or more (lazy/non-greedy)
+?             # One or more (lazy/non-greedy)

Examples

a*             # Matches "", "a", "aa", "aaa", ...
a+             # Matches "a", "aa", "aaa", ... (not "")
a?             # Matches "" or "a"
a{3}           # Matches exactly "aaa"
a{3,5}         # Matches "aaa", "aaaa", or "aaaaa"

Anchors

Specify where a pattern should match:

^              # Start of string
$              # End of string
\b             # Word boundary
\B             # Non-word boundary

Examples

^hello         # Matches "hello" only at start
world$         # Matches "world" only at end
^hello world$  # Matches entire string exactly
\bword\b       # Matches "word" as whole word

Groups and Capturing

Capturing Groups `(...)`

Capture and extract matched content:

(\d{3})-(\d{3})-(\d{4})

Matches phone number and captures area code, exchange, and number separately

Non-Capturing Groups `(?:...)`

Group without capturing:

(?:Mr|Mrs|Ms)\.\s(\w+)

Groups title options but only captures the name

Named Groups `(?<name>...)`

Capture with a name:

(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})

Alternation

Match one of several patterns:

cat|dog        # Matches "cat" or "dog"
(Mon|Tue|Wed)  # Matches any weekday

Flags

Modify regex behavior:

g              # Global (find all matches)
i              # Case-insensitive
m              # Multiline (^ and $ match line boundaries)
s              # Dotall (. matches newline)
u              # Unicode

Common Regex Patterns

Email Validation

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

Breakdown:

^[a-zA-Z0-9._%+-]+ - Username part
@ - Literal @
[a-zA-Z0-9.-]+ - Domain name
\. - Literal dot (escaped)
[a-zA-Z]{2,}$ - Top-level domain (2+ letters)

Phone Number (US Format)

^(\+?1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$

Matches formats like:

(555) 123-4567
555-123-4567
555.123.4567
+1 555 123 4567

URL Pattern

https?://(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)

Date Format (YYYY-MM-DD)

^\d{4}-\d{2}-\d{2}$

Strong Password

^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$

Requirements:

At least one lowercase letter
At least one uppercase letter
At least one digit
At least one special character
Minimum 8 characters

Hex Color Code

^#([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})$

Matches: #FF0000, #f00

Credit Card Number

^\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}$

Matches formats like:

1234 5678 9012 3456
1234-5678-9012-3456
1234567890123456

Lookahead and Lookbehind

Positive Lookahead `(?=...)`

Match pattern followed by another pattern:

\d+(?= dollars)

Matches digits only if followed by " dollars"

Negative Lookahead `(?!...)`

Match pattern NOT followed by another pattern:

\d+(?! dollars)

Matches digits NOT followed by " dollars"

Lookbehind `(?<=...)` / `(?<!...)`

Similar to lookahead but checks what comes before:

(?<=\$)\d+     # Digits preceded by $
(?<!\$)\d+     # Digits NOT preceded by $

Unicode and Flags

Use the u flag for proper Unicode handling
Be careful when matching emojis or surrogate pairs; prefer \p{Emoji} classes where supported
Normalize text (NFC) when comparing composed/accented characters

Best Practices

Performance Tips

Be Specific: More specific patterns are faster

# Faster
^\d{3}-\d{3}-\d{4}$

# Slower
.*.*.*.*.*.*.*.*.*.*.*

Avoid Catastrophic Backtracking:

# Dangerous (can cause performance issues)
(a+)+b

# Better
a+b

Use Anchors: Start/end anchors improve performance
```
^pattern$   # Faster
pattern     # Slower
```

Readability Tips

Add Comments: Use (?#...) for inline comments

^\d{4}(?#year)-\d{2}(?#month)-\d{2}(?#day)$

Use Non-Capturing Groups: When you don't need to capture

(?:cat|dog)  # Better than (cat|dog) if not capturing

Break Complex Patterns: Split into multiple regex patterns when possible

Validation Tips

Anchors for Exact Matching: Always use ^ and $ for validation

^email@domain\.com$  # Exact match
email@domain\.com   # Partial match (can match "my email@domain.com is...")

Test Edge Cases: Test with empty strings, special characters, unicode
Don't Over-Validate: Sometimes simpler patterns are better than complex ones

Common Mistakes

1. Forgetting Anchors

# Wrong - matches "email@domain.com" anywhere
\d{3}-\d{3}-\d{4}

# Correct - matches entire string
^\d{3}-\d{3}-\d{4}$

2. Not Escaping Special Characters

# Wrong - . matches any character
\d+.\d+

# Correct - \. matches literal dot
^\d+\.\d+$

3. Greedy vs Lazy Quantifiers

# Greedy - matches as much as possible
<.*>           # Matches entire "<tag>content</tag>"

# Lazy - matches as little as possible
<.*?>          # Matches "<tag>" separately

Real-World Examples

Extracting Data from Text

const text = "Contact: john@example.com or call 555-123-4567";
const email = text.match(/[\w.]+@[\w.]+\.\w+/)[0];
const phone = text.match(/\d{3}-\d{3}-\d{4}/)[0];

Replacing Patterns

const text = "Hello, my email is user@example.com";
const masked = text.replace(/[\w.]+@[\w.]+\.\w+/, '***@***.***');
// Result: "Hello, my email is ***@***.***"

Splitting Text

const csv = "name,email,phone";
const fields = csv.split(/,/);

Conclusion

Regular expressions are a powerful tool for text processing, validation, and pattern matching. While they can seem complex at first, understanding the fundamental concepts—character classes, quantifiers, anchors, and groups—will enable you to create effective patterns for a wide variety of use cases.

Practice with real-world scenarios, test your patterns thoroughly, and remember that sometimes multiple simple patterns are better than one complex pattern. With regular expressions, you'll be able to handle text processing tasks efficiently and elegantly.

What are Regular Expressions?

Why Use Regular Expressions?

Basic Regex Syntax

Literal Characters

Special Characters (Metacharacters)

Character Classes

Quantifiers

Examples

Anchors

Examples

Groups and Capturing

Capturing Groups (...)

Non-Capturing Groups (?:...)

Named Groups (?<name>...)

Alternation

Flags

Common Regex Patterns

Email Validation

Phone Number (US Format)

URL Pattern

Date Format (YYYY-MM-DD)

Strong Password

Hex Color Code

Credit Card Number

Lookahead and Lookbehind

Positive Lookahead (?=...)

Negative Lookahead (?!...)

Lookbehind (?<=...) / (?<!...)

Unicode and Flags

Best Practices

Performance Tips

Readability Tips

Validation Tips

Common Mistakes

1. Forgetting Anchors

2. Not Escaping Special Characters

3. Greedy vs Lazy Quantifiers

Real-World Examples

Extracting Data from Text

Replacing Patterns

Splitting Text

Conclusion

Try Regex Tester Now

Frequently Asked Questions

Q Why does my regex seem slow?

Related Articles

Understanding JSON Formatting and Validation

Understanding URL Encoding - Complete Guide to Percent Encoding

JSONPath Expressions - Query JSON Like a Pro

JWT Tokens Explained - Complete Guide to JSON Web Tokens

Capturing Groups `(...)`

Non-Capturing Groups `(?:...)`

Named Groups `(?<name>...)`

Positive Lookahead `(?=...)`

Negative Lookahead `(?!...)`

Lookbehind `(?<=...)` / `(?<!...)`