Jsoup - Howto clean html by escaping not deleting the unwanted html?

时光毁灭记忆、已成空白 提交于 2020-01-01 04:54:07

问题


Is there a way of getting jsoup to clean a string with HTML in it by escaping the unwanted HTML rather than removing it completely? My example:

String dirty = "This is <b>REALLY</b> dirty code from <a href="www.rubbish.url.zzzz">haxors-r-us</a>
String clean = Jsoup.clean(dirty, new Whitelist().addTags("a").addAttributes("a", "href", "name", "rel", "target"));

This gives a "clean" string of:

This is    REALLY    dirty code from <a href="www.rubbish.url.zzzz">haxors-r-us</a>

What I am wanting is the "clean" string to be:

"This is &lt;b&gt;REALLY&lt;/b&gt; dirty code from <a href="www.rubbish.url.zzzz">haxors-r-us</a>

回答1:


Assuming String rather than HTML documents are being parsed (as per your question) this method will work:

public String escapeHtml(String source) {
    Document doc = Jsoup.parseBodyFragment(source);
    Elements elements = doc.select("b");
    for (Element element : elements) {
        element.replaceWith(new TextNode(element.toString(),""));
    }
    return Jsoup.clean(doc.body().toString(), new Whitelist().addTags("a").addAttributes("a", "href", "name", "rel", "target"));
}

You could make the "b" tag an argument to pass in a list of tags you wish to escape.

The associated passing JUnit test:

@Test
public void testHtmlEscaping() throws Exception {
    String source = "This is <b>REALLY</b> dirty code from <a href=\"www.rubbish.url.zzzz\">haxors-r-us</a>";
    String expected = "This is &lt;b&gt;REALLY&lt;/b&gt; dirty code from \n<a href=\"www.rubbish.url.zzzz\">haxors-r-us</a>";
    String transformed = transformer.escapeHtml(source);
    assertEquals(transformed, expected);
}

Note that I added a line return "\n" before your "a" tag in my test's "expected" String because JSoup formats the page.



来源:https://stackoverflow.com/questions/7756674/jsoup-howto-clean-html-by-escaping-not-deleting-the-unwanted-html

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!