问题
The following code replaces this text: <br /> with <br>:
String removeDisallowedTags(String textToEscape) {
Whitelist whitelist = Whitelist.none();
whitelist.addTags(new String[] { "b", "br", "font" });
String safe = Jsoup.clean(textToEscape, whitelist);
return safe;
}
Why?
回答1:
Jsoup.clean() processes the document as HTML by default, and in HTML <br> without closing tag is allowed. The same goes with <img>.
You have to parse the code as XML. That will leave the tags closed - and it will even close them for you. A fixed method with some additional settings:
String cleanXmlAndRemoveUnwantedTags(String textToEscape) {
Whitelist whitelist = Whitelist.none();
whitelist.addTags(allowedTags);
OutputSettings outputSettings = new OutputSettings()
.syntax(OutputSettings.Syntax.xml)
.charset(StandardCharsets.UTF_8)
.prettyPrint(false);
String safe = Jsoup.clean(textToEscape, "", whitelist, outputSettings);
return safe;
}
来源:https://stackoverflow.com/questions/34218225/jsoup-clean-leaves-unclosed-and-opens-tags