Allow user submitted HTML in PHP

前端未结

关注

 7  1336

I want to allow a lot of user submitted html for user profiles, I currently try to filter out what I don\'t want but I am now wanting to change and use a whitelist approach.

相关标签:

7条回答

闹比i

2020-12-01 20:46
You can just use the strip_tags() function

Since the function is defined as
```
string strip_tags  ( string $str  [, string $allowable_tags  ] )
```
You can do this:
```
$html = $_POST['content'];
$html = strip_tags($html, '<b><a><i><u><span>');
```
But take note that using strip_tags, you won't be able to filter off the attributes. e.g.
```
<a href="javascript:alert('haha caught cha!');">link</a>
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
慢半拍i

2020-12-01 20:50

HTML Purifier is the best HTML parser/cleaner out there.

0 讨论(0)
发布评论:

提交评论
- 加载中...
再見小時候

2020-12-01 20:50
It's a pretty simple aim to achieve actually - you just need to check for anything that's NOT some tags from a list of whitelisted tags and remove them from the source. It can be done quite easily with one regex.
```
function sanitize($html) {
  $whitelist = array(
    'b', 'i', 'u', 'strong', 'em', 'a'
  );

  return preg_replace("/<(^".implode("|", $whitelist).")(.*)>(.*)<\/(^".implode("|", $whitelist).")>/", "", $html);
}
```
I haven't tested this, and there's probably an error in there somewhere but you get the gist of how it works. You might also want to look at using a formatting language such as Textile or Markdown.

Jamie
0 讨论(0)
发布评论:

提交评论
- 加载中...
感情败类

2020-12-01 20:51

HTML Purifier is a standards-compliant HTML filter library written in PHP. HTML Purifier will not only remove all malicious code (better known as XSS) with a thoroughly audited, secure yet permissive whitelist, it will also make sure your documents are standards compliant, something only achievable with a comprehensive knowledge of W3C's specifications.

0 讨论(0)
发布评论:

提交评论
- 加载中...

说谎

2020-12-01 20:59

Try this function "getCleanHTML" below, extract text content from the elements with exceptions of elements with tag name in the whitelist. This code is clean and easy to understand and debug.

<?php

$TagWhiteList = array(
    'b', 'i', 'u', 'strong', 'em', 'a', 'img'
);

function getHTMLCode($Node) {
    $Document = new DOMDocument();    
    $Document->appendChild($Document->importNode($Node, true));
    return $Document->saveHTML();
}
function getCleanHTML($Node, $Text = "") {
    global $TagWhiteList;

    $TextName = $Node->tagName;
    if ($TextName == null)
        return $Text.$Node->textContent;

    if (in_array($TextName, $TagWhiteList)) 
        return $Text.getHTMLCode($Node);

    $Node = $Node->firstChild;
    if ($Node != null)
        $Text = getCleanHTML($Node, $Text);

    while($Node->nextSibling != null) {
        $Text = getCleanHTML($Node->nextSibling, $Text);
        $Node = $Node->nextSibling;
    }
    return $Text;
}

$Doc = new DOMDocument();
$Doc->loadHTMLFile("Test.html");
echo getCleanHTML($Doc->documentElement)."\n";

?>

Hope this helps.

0 讨论(0)

小鲜肉

2020-12-01 21:09

Maybe it is safer to use DOMDocument to analyze it correctly, remove disallowed tags with removeChild() and then get the result. It is not always safe to filter stuff with regular expressions, specially if things start to get such complexity. Hackers can find a way to cheat your filters, forums and social networks do know that very well.

For instance, browsers ignore spaces after the <. Your regex filter <script, but if I use < script... big FAIL!

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页