I\'m looking for class/util etc. to sanitize HTML code i.e. remove dangerous tags, attributes and values to avoid XSS and similar attacks.
I get html code from rich
Thanks to @Saljack's answer. Just to elaborate more to OWASP Java HTML Sanitizer. It worked out really well (quick) for me. I just added the following to the pom.xml in my Maven project:
com.googlecode.owasp-java-html-sanitizer
owasp-java-html-sanitizer
20150501.1
Check here for latest release.
Then I added this function for sanitization:
private String sanitizeHTML(String untrustedHTML){
PolicyFactory policy = new HtmlPolicyBuilder()
.allowAttributes("src").onElements("img")
.allowAttributes("href").onElements("a")
.allowStandardUrlProtocols()
.allowElements(
"a", "img"
).toFactory();
return policy.sanitize(untrustedHTML);
}
More tags can be added by extending the comma delimited parameter in allowElements method.
Just add this line prior passing the bean off to save the data:
bean.setHtml(sanitizeHTML(bean.getHtml()));
That's it!
For more complex logic, this library is very flexible and it can handle more sophisticated sanitizing implementation.