Python regex: Difference between (.+) and (.+?)

Python regex: Difference between (.+) and (.+?)

.+ is greedy — it matches until it cant match any more and gives back only as much as needed.

.+? is not — it stops at the first opportunity.

Examples:

Assume you have this HTML:

<span id=yfs_l84_sbux>foo bar</span><span id=yfs_l84_sbux2>foo bar</span>

This regex matches the whole thing:

<span id=yfs_l84_sbux>(.+)</span>

It goes all the way to the end, then gives back one </span>, but the rest of the regex matches that last </span>, so the complete regex matches the entire HTML chunk.

But this regex stops at the first </span>:

<span id=yfs_l84_sbux>(.+?)</span>

? is a non-greedy modifier. * by default is a greedy repetition operator – it will gobble up everything it can; when modified by ? it becomes non-greedy and will eat up only as much as will satisfy it.

Thus for

<span id=yfs_l84_sbux>want</span>text<span id=somethingelse>dontwant</span>

.*?</span> will eat up want, then hit </span> – and this satisfies the regexp with minimal repetitions of ., resulting in <span id=yfs_l84_sbux>want</span> being the match. However, .* will try to see if it can eat more – it will go and find the other </span>, with .*? matching want</span>text<span id=somethingelse>dontwant, resulting in what you got – much more than you wanted.

Python regex: Difference between (.+) and (.+?)

(.+) is greedy. It takes what it can and gives back when needed.

(.+?) is ungreedy. It takes as few as possible.

See:

delegate

[delegate] /^(.+)e/
[de]legate /^(.+?)e/

Also, comparing the Regex debugger log here and here will show you what the ungreedy modifier does more effectively.

Leave a Reply

Your email address will not be published.