Jsoup.clean() leaves unclosed and opens tags

笑着哭i 提交于 2020-01-30 06:24:50

问题


The following code replaces this text: <br /> with <br>:

String removeDisallowedTags(String textToEscape) {
    Whitelist whitelist = Whitelist.none();
    whitelist.addTags(new String[] { "b", "br", "font" });

    String safe = Jsoup.clean(textToEscape, whitelist);
    return safe;
}

Why?


回答1:


Jsoup.clean() processes the document as HTML by default, and in HTML <br> without closing tag is allowed. The same goes with <img>.

You have to parse the code as XML. That will leave the tags closed - and it will even close them for you. A fixed method with some additional settings:

String cleanXmlAndRemoveUnwantedTags(String textToEscape) {
    Whitelist whitelist = Whitelist.none();
    whitelist.addTags(allowedTags);

    OutputSettings outputSettings = new OutputSettings()
                    .syntax(OutputSettings.Syntax.xml)
                    .charset(StandardCharsets.UTF_8)
                    .prettyPrint(false);

    String safe = Jsoup.clean(textToEscape, "", whitelist, outputSettings);
    return safe;
}


来源:https://stackoverflow.com/questions/34218225/jsoup-clean-leaves-unclosed-and-opens-tags

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!