问题
My regexp:
<([a-zA-Z0-9]+)>[\na-zA-Z0-9]*<\/\1+>
my string:
<div>
<f>
</f>
</div>
the result is:
array(2
0 => array(1
0 => <f>
</f>
)
1 => array(1
0 => f
)
)
why it is capturing <f></f>
, and ignoring the first <div>
?
回答1:
The answer is USE A PARSER INSTEAD (sorry for my shouting). While it is sometimes faster to use a regular expression to obtain an ID or URL string, html tags need a rather error-prone way of understanding via regex. Consider the following code, isn't that much more beautiful than druidic characters with special meanings?
<?php
$str = "
<container>
<div class='someclass' data='somedata'>
<f>some content here</f>
</div>
</container>";
$xml = simplexml_load_string($str);
echo $xml->div->f; // some content here
$attributes = $xml->div->attributes();
print_r($attributes); // class and data as keys
?>
回答2:
I'd say it's because your second character class statement tries to find 0 or more of the characters before the ending tag comes, and that doesn't match with the <div>...</div>
block.
来源:https://stackoverflow.com/questions/34252497/php-regexp-parse-html