jsoup to strip only html tags not new line character?

為{幸葍}努か 提交于 2019-12-01 12:46:26

You get a single line because text() remove all whitepace characters. But you can use a StringBuilder and insert each line there:

final String html = "<p>test1 <b>test2</b> test 3 </p>"
                    + "<p>test4 </p>";

Document doc = Jsoup.parse(html);        
StringBuilder sb = new StringBuilder();


for( Element element : doc.select("p") )
{
    /*
     * element.text() returns the text of this element (= without tags).
     */
    sb.append(element.text()).append('\n');
}

System.out.println(sb.toString().trim());

Output:

test1 test2 test 3
test4

You can also do this:

public static String cleanNoMarkup(String input) {
    final Document.OutputSettings outputSettings = new Document.OutputSettings().prettyPrint(false);
    String output = Jsoup.clean(input, "", Whitelist.none(), outputSettings);
    return output;

}

The important things here are: 1. Whitelist.none() - so no markup is allowed 2..prettyPrint(false) - so linebreaks are not removed

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!