Removing HTML entities while preserving line breaks with JSoup

前端 未结 2 654
情书的邮戳
情书的邮戳 2020-12-21 06:34

I have been using JSoup to parse lyrics and it has been great until now, but have run into a problem.

I can use Node.html() to return the full HTML of t

2条回答
  •  無奈伤痛
    2020-12-21 06:53

    based on another answer from stackoverflow I added a few fixes and came with

        String text = Jsoup.parse(html.replaceAll("(?i)]*>", "br2nl").replaceAll("\n", "br2nl")).text();
        text = text.replaceAll("br2nl ", "\n").replaceAll("br2nl", "\n").trim();
    

    Hope this helps

提交回复
热议问题