问题
I\'m thinking of adding a rich text editor to allow a non-programmer to change the aspect of text. However, one issue is that it\'s possible to distort the layout of a rendered page if the markup is incorrect. What\'s a good lightweight way to sanitize html?
回答1:
You will have to decide between good and lightweight. The recommended choice is 'HTMLPurifier', because it provide no-fuss secure defaults. As faster alternative it is often advised to use 'htmLawed'.
See also this quite objective overview from the HTMLPurifier author: http://htmlpurifier.org/comparison
回答2:
I really like HTML Purifier, which allows you to specify which tags and attirbutes are allowed in your HTML code -- and generates valid HTML.
回答3:
Use BB codes (or like here on SO), otherwise chances are very slim. Example function...
function parse($string){
$pattern = array(
"/\[url\](.*?)\[\/url\]/",
"/\[img\](.*?)\[\/img\]/",
"/\[img\=(.*?)\](.*?)\[\/img\]/",
"/\[url\=(.*?)\](.*?)\[\/url\]/",
"/\[red\](.*?)\[\/red\]/",
"/\[b\](.*?)\[\/b\]/",
"/\[h(.*?)\](.*?)\[\/h(.*?)\]/",
"/\[p\](.*?)\[\/p\]/",
"/\[php\](.*?)\[\/php\]/is"
);
$replacement = array(
'<a href="\\1">\\1</a>',
'<img alt="" src="\\1"/>',
'<img alt="" class="\\1" src="\\2"/>',
'<a rel="nofollow" target="_blank" href="\\1">\\2</a>',
'<span style="color:#ff0000;">\\1</span>',
'<span style="font-weight:bold;">\\1</span>',
'<h\\1>\\2</h\\3>',
'<p>\\1</p>',
'<pre><code class="php">\\1</code></pre>'
);
$string = preg_replace($pattern, $replacement, $string);
$string = nl2br($string);
return $string;
}
...
echo parse("[h2]Lorem Ipsum[/h2][p]Dolor sit amet[/p]");
Result...
<h2>Lorem Ipsum</h2><p>Dolor sit amet</p>

Or just use HTML Purifier :)
回答4:
Both HTML Purifier and htmLawed are good. htmLawed has the advantage of a much smaller footprint and high configurability. Besides doing the standard work of balancing tags, filtering specific HTML tags or their attributes or attribute content (through white or black lists), etc., it also allows the use of custom functions.
来源:https://stackoverflow.com/questions/5512712/sanitizing-html-input