How can I allow HTML in a whitelist with PHP

白昼怎懂夜的黑 提交于 2019-11-27 07:19:49

问题


I know there is a lot of discussion for years on best methods of filtering data with PHP but I would like to go the whitelist approach in my current project.

I only want a user to be able to use the following HTML

<b>bold</b>
<i>italics</i>
<u>underline</u>
<s>strikethrough</s>
<big>Big size</big >
<small>Small size</small>

Hyperlink <a href="http://www.site.com">website</a>

A Bulleted List:
<ul>
<li>One Item</li>
<li>Another Item</li>
</ul>

An Ordered List:
<ol>
<li> First Item</li>
<li> Second Item</li>
</ol>

<blockquote>Because it is indented</blockquote>

<h1>Heading 1</h1>
<h2>Heading 2</h2>
<h3>Heading 3</h3>

Can anyone show me the best method of doing this for performance in PHP? I have only in the past allowed all html minus certain codes


回答1:


The simplest solution would be strip_tags(), which accepts a second argument containing allowable tags:

strip_tags($string, "<b><i><u><a><s><big><small><ul><li><ol><blockquote><h1><h2><h3>");



回答2:


I believe the HTML Purifier Library will work nicely:

http://htmlpurifier.org/

HTML Purifier is a standards-compliant HTML filter library written in PHP. HTML Purifier will not only remove all malicious code (better known as XSS) with a thoroughly audited, secure yet permissive whitelist, it will also make sure your documents are standards compliant, something only achievable with a comprehensive knowledge of W3C's specifications. Tired of using BBCode due to the current landscape of deficient or insecure HTML filters? Have a WYSIWYG editor but never been able to use it? Looking for high-quality, standards-compliant, open-source components for that application you're building? HTML Purifier is for you!




回答3:


Another route is using strip_tags with the second argument.

http://php.net/manual/en/function.strip-tags.php




回答4:


I would run the submitted code through tidy to normalize it first, and then use xpath or apply xslt to only select allowed elements. This way, nothing can leak. Do bear in mind, too, that in any given website situation you would probably have thousands if not hundreds of thousands of read requests for every write request [that uses tidy and xpath/xslt] so on average the performance impact is negligible. If you are doing batch processing on the other hand..

Edit: oh and: DON'T do this with regular expressions. It is mathematically impossible to do it correctly.



来源:https://stackoverflow.com/questions/1975613/how-can-i-allow-html-in-a-whitelist-with-php

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!