PHP regexp parse HTML

半腔热情 提交于 2019-12-12 05:17:22

问题


My regexp:

<([a-zA-Z0-9]+)>[\na-zA-Z0-9]*<\/\1+>

my string:

<div>
<f>
</f>
</div>

the result is:

array(2
  0 =>  array(1
  0 =>  <f>
</f>
)
1   =>  array(1
0   =>  f
)
)

why it is capturing <f></f>, and ignoring the first <div> ?


回答1:


The answer is USE A PARSER INSTEAD (sorry for my shouting). While it is sometimes faster to use a regular expression to obtain an ID or URL string, html tags need a rather error-prone way of understanding via regex. Consider the following code, isn't that much more beautiful than druidic characters with special meanings?

<?php
$str = "
<container>
    <div class='someclass' data='somedata'>
        <f>some content here</f>
    </div>
</container>";
$xml = simplexml_load_string($str);

echo $xml->div->f; // some content here
$attributes = $xml->div->attributes();
print_r($attributes); // class and data as keys
?>



回答2:


I'd say it's because your second character class statement tries to find 0 or more of the characters before the ending tag comes, and that doesn't match with the <div>...</div> block.



来源:https://stackoverflow.com/questions/34252497/php-regexp-parse-html

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!