Regex for HTML tags

╄→гoц情女王★ 提交于 2019-12-11 05:36:37

问题


I'm doing the following:

<?
$text = preg_replace ("/<p>(.*?)<\/p>/", "$1<br>", "$text");
?>

So I can get rid of <p> tags and place a space at the end of the string (this is for styling of the page).

This works for "<p>Something</p>" perfectly.

However, with text like:

<h3>Section 1.10.32 of "de Finibus Bonorum et Malorum", written by Cicero in 45 BC</h3>
<p>"Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur? Quis autem vel eum iure reprehenderit qui in ea voluptate velit esse quam nihil molestiae consequatur, vel illum qui dolorem eum fugiat quo voluptas nulla pariatur?"</p>

That I took from the Lorem Ipsum (lipsum.com) page, it doesn't work, and I don't have a clue why.

On a somehow related note (and I'm not sure if it's related enough to keep in the same question, but it could help towards this problem), is there any function or way to automatically remove every javascript snippet that these tags could have in them? e.g

<p onmouseover="alert('hello');">

Thanks for any help.


回答1:


Try this php call:

$text = preg_replace('~<p\b[^>]*>(.*?)</p>~smi', "$1<br>", $text);

It will handle ignore case matches (p and P) as well as multi-line matches.




回答2:


There you go :

$text = '<h3>Section 1.10.32 of "de Finibus Bonorum et Malorum", written by Cicero in 45 BC</h3><p class="toto">"Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur? Quis autem vel eum iure reprehenderit qui in ea voluptate velit esse quam nihil molestiae consequatur, vel illum qui dolorem eum fugiat quo voluptas nulla pariatur?"</p>';


$text = preg_replace('/<p\b[^>]*>(.*?)<\/p>/', "", $text)."<br>";

It also handles correctly any attribut your p might have (like a class in my example).




回答3:


There are some functions already saved in php documentation

specially this one: http://php.net/manual/en/function.strip-tags.php#93567

<?php
function strip_only($str, $tags) {
if(!is_array($tags)) {
    $tags = (strpos($str, '>') !== false ? explode('>', str_replace('<', '', $tags)) : array($tags));
    if(end($tags) == '') array_pop($tags);
}
foreach($tags as $tag) $str = preg_replace('#</?'.$tag.'[^>]*>#is', '', $str);
return $str;
}

$str = '<p style="text-align:center">Paragraph</p><strong>Bold</strong><br/><span style="color:red">Red</span><h1>Header</h1>';

echo strip_only($str, array('p', 'h1'));
echo strip_only($str, '<p><h1>');
?>


来源:https://stackoverflow.com/questions/5923415/regex-for-html-tags

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!