I have a form that people can add their stuff. However, in that form, if they enter JavaScript instead of only text, they can easily inject whatever they want to do. In orde
This is exactly the intent of the OWASP AntiSamy project.
The OWASP AntiSamy project is a few things. Technically, it is an API for ensuring user-supplied HTML/CSS is in compliance within an application's rules. Another way of saying that could be: It's an API that helps you make sure that clients don't supply malicious cargo code in the HTML they supply for their profile, comments, etc., that get persisted on the server. The term "malicious code" in regards to web applications usually mean "JavaScript." Cascading Stylesheets are only considered malicious when they invoke the JavaScript engine. However, there are many situations where "normal" HTML and CSS can be used in a malicious manner. So we take care of that too.
Another alternative is the OWASP HTMLSanitizer project. It is faster, has less dependencies and actively supported by the project lead as of now. I don’t think it has gone through any GA/Stable release yet so you should consider that when evaluating this library.
You need to parse the HTML text on the server as XML, then throw out any tags and attributes that aren't in a strict whitelist.
(And check the URLs in href
and src
attributes)
I'd recommend using Jsoup for this. Here's an extract of relevance from its site.
Sanitize untrusted HTML
Problem
You want to allow untrusted users to supply HTML for output on your website (e.g. as comment submission). You need to clean this HTML to avoid cross-site scripting (XSS) attacks.
Solution
Use the jsoup HTML Cleaner with a configuration specified by a Whitelist.
String unsafe = "<p><a href='http://example.com/' onclick='stealCookies()'>Link</a></p>"; String safe = Jsoup.clean(unsafe, Whitelist.basic()); // now: <p><a href="http://example.com/" rel="nofollow">Link</a></p>
So, all you basically need to do is the the following during processing the submitted text:
String text = request.getParameter("text");
String safe = Jsoup.clean(text, Whitelist.basic());
// Persist 'safe' in DB instead.
Jsoup offers more advantages than that as well. See also Pros and Cons of HTML parsers in Java.