Removing text enclosed between HTML tags using JSoup

自闭症网瘾萝莉.ら 提交于 2019-11-29 21:03:16

问题


In some cases of HTML cleaning, I would like to retain the text enclosed between the tags(which is the default behaviour of Jsoup) and in some cases, I would like to remove the text as well as the HTML tags. Can someone please throw some light on how I can remove the text enclosed between the HTML tags using Jsoup?


回答1:


The Cleaner will always drop tags and preserve text. If you need to drop elements (i.e. tags and text / nested elements), you can pre-parse the HTML, remove the elements using either remove() or empty(), then run the resulting through the cleaner.

For example:

String html = "Clean <div>Text dropped</div>";
Document doc = Jsoup.parse(html);
doc.select("div").remove();
// if not removed, the cleaner will drop the <div> but leave the inner text
String clean = Jsoup.clean(doc.body().html(), Whitelist.basic());



回答2:


1.     String html = "<!DOCTYPE html><html><head><title></title></head><body><p>hello there</p></body></html>";
2.      Document d = Jsoup.parse(html);
3.      System.out.println(d);
4.      System.out.println("************************************************");
5.      d.getElementsByTag("p").remove();
6.      System.out.println(d);

while you getting with Elements you getting some trouble you can do this action on Document d object. that will work accurate.



来源:https://stackoverflow.com/questions/6738762/removing-text-enclosed-between-html-tags-using-jsoup

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!