Greedy, Lazy, and Catastrophic Backtracking
Quantifiers like * and + are greedy: they grab as much text as they possibly can, then hand pieces back only when the rest of the pattern forces them to. Usually that is fine. Occasionally it matches far more than you meant, and in a few nasty cases it makes a regex so slow it can freeze a server. This guide explains greedy versus lazy matching, the backtracking underneath, and the one pattern shape to never write. Watch it happen in the Regex Tester.
Try the Regex Tester toolTest a regular expression against your text in real time. See every match highlighted, with capture groups and flags. Runs entirely in your browser.Greedy by default
Every standard quantifier is greedy. Given a choice, it consumes as many characters as it can and only backtracks, giving characters back one at a time, when a later part of the pattern fails to match. The classic illustration is matching an HTML-like tag with a dot, which matches almost anything.
text: <a><b>
pattern: <.+>
The greedy .+ swallows a><b and the match is the whole
"<a><b>", not just "<a>", because .+ took everything it could.Lazy: take as little as possible
Adding a question mark after a quantifier makes it lazy (also called non-greedy): it matches as few characters as it can and only expands when forced. The forms are *?, +?, ??, and {n,m}?. Make the tag pattern lazy and it stops at the first closing bracket.
pattern: <.+?>
text: <a><b>
Now the match is "<a>". Run it globally and you get
two matches, "<a>" and "<b>", which is usually what you wanted.Backtracking, briefly
Backtracking is the engine's trial and error. A greedy quantifier grabs everything, the next token fails, so the engine releases one character and tries again, repeating until either the pattern matches or every option is exhausted. For ordinary patterns this is cheap. The danger appears when the number of ways to split the text up grows explosively.
Catastrophic backtracking and ReDoS
When a repeated group itself contains a repetition, the engine can have an astronomical number of ways to divide the same input between the two quantifiers. If the overall match then fails, it tries them all before giving up. This is catastrophic backtracking, and because a short malicious input can cost exponential time, it is the basis of a denial-of-service attack known as ReDoS.
pattern: ^(a+)+$
input: aaaaaaaaaaaaaaaaaaaaaaaa! (note the trailing !)
The inner a+ and the outer + can split those a's in 2^n ways.
The final $ never matches because of the !, so the engine
tries every split. Around 30 a's, this can hang for seconds.How to avoid it
- Remove the nesting. (a+)+ is just a+; (a+)*b is a*b. If a single quantifier expresses the same thing, the explosion disappears.
- Be specific instead of using dot-star. [^>]+ or [^"]* cannot overlap the delimiter that ends them, which removes most ambiguity.
- Bound the repetition. {1,64} instead of + caps how far the engine can run on hostile input.
- In engines that support them (PCRE, Java, .NET), possessive quantifiers like a++ and atomic groups (?>...) forbid backtracking outright. JavaScript does not have these, so in JS you restructure the pattern instead.
Pressure-test your pattern
The cheapest insurance is to try your regex against a long string that almost matches but fails at the end. If it stays instant, you are fine; if it stalls, you have a nested-quantifier problem to rewrite. The Regex Tester runs locally in your browser, so you can throw pathological input at a pattern safely. The phenomenon is described in depth at regular-expressions.info on catastrophic backtracking.
Stress-test a patternTest a regular expression against your text in real time. See every match highlighted, with capture groups and flags. Runs entirely in your browser.Related articles
Regex Basics: Characters, Classes, Anchors, and Quantifiers
The core building blocks of regular expressions: literal characters, character classes, anchors, quantifiers, and the flags that change everything.
Capturing Groups, Named Groups, and Backreferences
How parentheses capture parts of a match, how to name them, how backreferences match repeated text, and how to use all three when replacing.
Why You Should Not Validate Email With a Giant Regex
The full RFC email regex is thousands of characters, rejects valid addresses, accepts dead ones, and risks ReDoS. Here is what to do instead.