What\'s an example of something dangerous that would not be caught by the code below?
EDIT: After some of the comments I added another line, commented below. See V
Another vote for whitelisting. But it looks like you're going about this the wrong way. The way I do it, is to parse the HTML into a tag tree. If the tag you're parsing is in the whitelist, give it a tree node, and parse on. Same goes for its attributes.
Dropped attributes are just dropped. Everything else is HTML-escaped literal content.
And the bonus of this route is because you're effectively regenerating all the markup, it's all completely valid markup! (I hate it when people leave comments and they screw up the validation/design.)
Re "I can't whitelist" (para): Blacklisting is a maintenance-heavy approach. You'll have to keep an eye on new exploits and make sure your covered. It's a miserable existence. Just do it right once and you'll never need to touch it again.