jsoup to strip only html tags not new line character?

荒凉一梦 提交于 2019-12-30 12:13:07

问题


I have below content in Java where I want to strip only html tags but not new line characters

<p>test1 <b>test2</b> test 3 </p> //line 1
<p>test4 </p> //line 2

If I open above content in text rich editor, line 1 and line 2 are displayed in different lines(without showing </p> tag).But in notepad content is shown along with </p> tags. To remove all html tags I used

Jsoup.parse(aboveContent).text()

It removes all html characters. But it shows all line 1 and line 2 in same line in notepad. Somehow Jsoup also removes newline character.

What I tried:-

I also tried replacing </p> with \r\n and then do to remove html tags

 Jsoup.parse(contentWith\r\n-Insteadof-</p>Tag ).text()

but still Jsoup removes end of line character(as in the debugger I can see both line1 and line2) in same line.

How I can make Jsoup to strip only html character but not new line character?


回答1:


You get a single line because text() remove all whitepace characters. But you can use a StringBuilder and insert each line there:

final String html = "<p>test1 <b>test2</b> test 3 </p>"
                    + "<p>test4 </p>";

Document doc = Jsoup.parse(html);        
StringBuilder sb = new StringBuilder();


for( Element element : doc.select("p") )
{
    /*
     * element.text() returns the text of this element (= without tags).
     */
    sb.append(element.text()).append('\n');
}

System.out.println(sb.toString().trim());

Output:

test1 test2 test 3
test4



回答2:


You can also do this:

public static String cleanNoMarkup(String input) {
    final Document.OutputSettings outputSettings = new Document.OutputSettings().prettyPrint(false);
    String output = Jsoup.clean(input, "", Whitelist.none(), outputSettings);
    return output;

}

The important things here are: 1. Whitelist.none() - so no markup is allowed 2..prettyPrint(false) - so linebreaks are not removed



来源:https://stackoverflow.com/questions/14453047/jsoup-to-strip-only-html-tags-not-new-line-character

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!