I have a rich text editor that passes HTML to the server. That HTML is then displayed to other users. I want to make sure there is no JavaScript in that HTML. Is there any w
Here is how I do it using a white-listing approach (Javascript and Python code)
https://github.com/dcollien/FilterHTML
I define a specification for a subset of allowed HTML, and that is only what should get through this filter. There's some options to also purify URL attributes, by only allowing certain schemes (like http:, ftp:, etc.) and disallowing those that would cause XSS/Javascript problems (like javascript:, or even data:)
edit: This isn't going to give you 100% safety out of the box for all situations, but used intelligently and in conjunction with a few other tricks (like checking if urls are on the same domain, and the correct content-type, etc.) it could be what you need