I want to allow a lot of user submitted html for user profiles, I currently try to filter out what I don\'t want but I am now wanting to change and use a whitelist approach.
It's a pretty simple aim to achieve actually - you just need to check for anything that's NOT some tags from a list of whitelisted tags and remove them from the source. It can be done quite easily with one regex.
function sanitize($html) {
$whitelist = array(
'b', 'i', 'u', 'strong', 'em', 'a'
);
return preg_replace("/<(^".implode("|", $whitelist).")(.*)>(.*)<\/(^".implode("|", $whitelist).")>/", "", $html);
}
I haven't tested this, and there's probably an error in there somewhere but you get the gist of how it works. You might also want to look at using a formatting language such as Textile or Markdown.
Jamie