Skip to main content
Blog / Programming

Mastering Regular Expressions - Complete Guide to Regex Patterns

Learn everything about regular expressions, how they work, common patterns, and best practices for pattern matching, validation, and text processing.

DevToolsCenter Team
10 min read

What are Regular Expressions?

Regular expressions (regex) are powerful pattern-matching tools that allow you to search, match, and manipulate text based on complex rules. They provide a concise way to describe patterns in strings and are supported by most programming languages.

Why Use Regular Expressions?

Regular expressions are useful for:

  • Text Validation: Email addresses, phone numbers, URLs
  • Search and Replace: Finding and replacing patterns in text
  • Data Extraction: Parsing and extracting information from strings
  • Text Processing: Formatting and cleaning data
  • Pattern Matching: Complex string matching requirements

Basic Regex Syntax

Literal Characters

Most characters match themselves:

hello

Matches: "hello"

Special Characters (Metacharacters)

These characters have special meaning and must be escaped with \:

. ^ $ * + ? { } [ ] \ | ( )

Character Classes

Character Set [...] - Matches any single character inside brackets:

[aeiou]        # Matches any vowel
[0-9]          # Matches any digit
[a-z]          # Matches any lowercase letter
[A-Za-z0-9]    # Matches alphanumeric characters
[^aeiou]       # Matches anything except vowels

Predefined Classes:

\d             # Digit [0-9]
\w             # Word character [A-Za-z0-9_]
\s             # Whitespace [ \t\n\r\f\v]
\D             # Non-digit [^0-9]
\W             # Non-word character
\S             # Non-whitespace

Quantifiers

Control how many times a pattern should match:

*              # Zero or more (greedy)
+              # One or more (greedy)
?              # Zero or one (optional)
{n}            # Exactly n times
{n,}           # n or more times
{n,m}          # Between n and m times
*?             # Zero or more (lazy/non-greedy)
+?             # One or more (lazy/non-greedy)

Examples

a*             # Matches "", "a", "aa", "aaa", ...
a+             # Matches "a", "aa", "aaa", ... (not "")
a?             # Matches "" or "a"
a{3}           # Matches exactly "aaa"
a{3,5}         # Matches "aaa", "aaaa", or "aaaaa"

Anchors

Specify where a pattern should match:

^              # Start of string
$              # End of string
\b             # Word boundary
\B             # Non-word boundary

Examples

^hello         # Matches "hello" only at start
world$         # Matches "world" only at end
^hello world$  # Matches entire string exactly
\bword\b       # Matches "word" as whole word

Groups and Capturing

Capturing Groups (...)

Capture and extract matched content:

(\d{3})-(\d{3})-(\d{4})

Matches phone number and captures area code, exchange, and number separately

Non-Capturing Groups (?:...)

Group without capturing:

(?:Mr|Mrs|Ms)\.\s(\w+)

Groups title options but only captures the name

Named Groups (?<name>...)

Capture with a name:

(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})

Alternation

Match one of several patterns:

cat|dog        # Matches "cat" or "dog"
(Mon|Tue|Wed)  # Matches any weekday

Flags

Modify regex behavior:

g              # Global (find all matches)
i              # Case-insensitive
m              # Multiline (^ and $ match line boundaries)
s              # Dotall (. matches newline)
u              # Unicode

Common Regex Patterns

Email Validation

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

Breakdown:

  • ^[a-zA-Z0-9._%+-]+ - Username part
  • @ - Literal @
  • [a-zA-Z0-9.-]+ - Domain name
  • \. - Literal dot (escaped)
  • [a-zA-Z]{2,}$ - Top-level domain (2+ letters)

Phone Number (US Format)

^(\+?1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$

Matches formats like:

  • (555) 123-4567
  • 555-123-4567
  • 555.123.4567
  • +1 555 123 4567

URL Pattern

https?://(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)

Date Format (YYYY-MM-DD)

^\d{4}-\d{2}-\d{2}$

Strong Password

^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$

Requirements:

  • At least one lowercase letter
  • At least one uppercase letter
  • At least one digit
  • At least one special character
  • Minimum 8 characters

Hex Color Code

^#([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})$

Matches: #FF0000, #f00

Credit Card Number

^\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}$

Matches formats like:

  • 1234 5678 9012 3456
  • 1234-5678-9012-3456
  • 1234567890123456

Lookahead and Lookbehind

Positive Lookahead (?=...)

Match pattern followed by another pattern:

\d+(?= dollars)

Matches digits only if followed by " dollars"

Negative Lookahead (?!...)

Match pattern NOT followed by another pattern:

\d+(?! dollars)

Matches digits NOT followed by " dollars"

Lookbehind (?<=...) / (?<!...)

Similar to lookahead but checks what comes before:

(?<=\$)\d+     # Digits preceded by $
(?<!\$)\d+     # Digits NOT preceded by $

Unicode and Flags

  • Use the u flag for proper Unicode handling
  • Be careful when matching emojis or surrogate pairs; prefer \p{Emoji} classes where supported
  • Normalize text (NFC) when comparing composed/accented characters

Best Practices

Performance Tips

  1. Be Specific: More specific patterns are faster

    # Faster
    ^\d{3}-\d{3}-\d{4}$
    
    # Slower
    .*.*.*.*.*.*.*.*.*.*.*
  2. Avoid Catastrophic Backtracking:

    # Dangerous (can cause performance issues)
    (a+)+b
    
    # Better
    a+b
  3. Use Anchors: Start/end anchors improve performance

    ^pattern$   # Faster
    pattern     # Slower

Readability Tips

  1. Add Comments: Use (?#...) for inline comments

    ^\d{4}(?#year)-\d{2}(?#month)-\d{2}(?#day)$
  2. Use Non-Capturing Groups: When you don't need to capture

    (?:cat|dog)  # Better than (cat|dog) if not capturing
  3. Break Complex Patterns: Split into multiple regex patterns when possible

Validation Tips

  1. Anchors for Exact Matching: Always use ^ and $ for validation

    ^email@domain\.com$  # Exact match
    email@domain\.com   # Partial match (can match "my email@domain.com is...")
  2. Test Edge Cases: Test with empty strings, special characters, unicode

  3. Don't Over-Validate: Sometimes simpler patterns are better than complex ones

Common Mistakes

1. Forgetting Anchors

# Wrong - matches "email@domain.com" anywhere
\d{3}-\d{3}-\d{4}

# Correct - matches entire string
^\d{3}-\d{3}-\d{4}$

2. Not Escaping Special Characters

# Wrong - . matches any character
\d+.\d+

# Correct - \. matches literal dot
^\d+\.\d+$

3. Greedy vs Lazy Quantifiers

# Greedy - matches as much as possible
<.*>           # Matches entire "<tag>content</tag>"

# Lazy - matches as little as possible
<.*?>          # Matches "<tag>" separately

Real-World Examples

Extracting Data from Text

const text = "Contact: john@example.com or call 555-123-4567";
const email = text.match(/[\w.]+@[\w.]+\.\w+/)[0];
const phone = text.match(/\d{3}-\d{3}-\d{4}/)[0];

Replacing Patterns

const text = "Hello, my email is user@example.com";
const masked = text.replace(/[\w.]+@[\w.]+\.\w+/, '***@***.***');
// Result: "Hello, my email is ***@***.***"

Splitting Text

const csv = "name,email,phone";
const fields = csv.split(/,/);

Conclusion

Regular expressions are a powerful tool for text processing, validation, and pattern matching. While they can seem complex at first, understanding the fundamental concepts—character classes, quantifiers, anchors, and groups—will enable you to create effective patterns for a wide variety of use cases.

Practice with real-world scenarios, test your patterns thoroughly, and remember that sometimes multiple simple patterns are better than one complex pattern. With regular expressions, you'll be able to handle text processing tasks efficiently and elegantly.

Try Regex Tester Now

Ready to put this into practice? Use our free Regex Tester tool. It works entirely in your browser with no signup required.

Launch Regex Tester
Free Forever No Signup Browser-Based

Frequently Asked Questions

Q Why does my regex seem slow?

A

Patterns with excessive backtracking (like nested quantifiers) can cause performance issues; simplify and anchor your regex.