13.5.3. Discussion
By default, all regular expressions in PHP are
what's known as greedy. This
means a quantifier always tries to match as many characters as
possible.
For example, take the pattern p.*, which matches a
p and then 0 or more characters, and match it
against the string php. A greedy regular
expression finds one match, because after it grabs the opening
p, it continues on and also matches the
hp. A nongreedy regular expression, on the other
hand, finds a pair of matches. As before, it matches the
p and also the h, but then
instead of continuing on, it backs off and leaves the final
p uncaptured. A second match then goes ahead and
takes the closing letter.
The following code shows that the greedy match finds only one hit;
the nongreedy ones find two:
print preg_match_all('/p.*/', "php"); // greedy
print preg_match_all('/p.*?/', "php"); // nongreedy
print preg_match_all('/p.*/U', "php"); // nongreedy
1
2
2
$html = '<b>I am bold.</b> <i>I am italic.</i> <b>I am also bold.</b>';
preg_match_all('#<b>(.+)</b>#', $html, $bolds);
print_r($bolds[1]);
Array
(
[0] => I am bold.</b> <i>I am italic.</i> <b>I am also bold.
)
Because there's a second set of bold tags, the
pattern extends past the first </b>, which
makes it impossible to correctly break up the HTML. If you use
minimal matching, each set of tags is self-contained:
$html = '<b>I am bold.</b> <i>I am italic.</i> <b>I am also bold.</b>';
preg_match_all('#<b>(.+?)</b>#', $html, $bolds);
print_r($bolds[1]);
Array
(
[0] => I am bold.
[1] => I am also bold.
)