问题
I have following html:
<div class="CustomClass">
Hi!<br/>
<br/>
Bla Bla bla<br/>
<br/>
<a href...></a>
bla bla bla
<iframe...></iframe>
Thank you!
</div>
I need a list with the children of the div, something like the following:
0->Hi!
2-><br/>
3->Bla Bla bla
4-><br/>
5-><a href...></a>
6->bla bla bla
7-><iframe...></iframe>
8->Thank you!
I tried by getting the children of the div element, and then iterating the children and converting them to html, but this returns only the tag elements and ignores the text between the elements. In ideal circumstances, the text would be surrounded by p tags, but this is not the case :S
If I use the element.ownText function on the div element, then I get the text without the tags, and I need both things, and in the right order :/
Is there a way to achieve that?
Thanks!
回答1:
You can use childNodes() to obtain a list of Node and it will be exactly what you need:
Document doc = Jsoup.parse("<div class=\"CustomClass\">Hi!<br/><br/>Bla Bla bla<br/><br/><a href...></a>bla bla bla<iframe></iframe>Thank you!</div>");
Element div = doc.selectFirst(".CustomClass");
List<Node> childNodes = div.childNodes();
for (int i = 0; i < childNodes.size(); i++) {
Node node = div.childNodes().get(i);
System.out.println(i + " -> " + node);
}
output:
0 ->
Hi!
1 -> <br>
2 -> <br>
3 -> Bla Bla bla
4 -> <br>
5 -> <br>
6 -> <a href...></a>
7 -> bla bla bla
8 -> <iframe></iframe>
9 -> Thank you!
来源:https://stackoverflow.com/questions/59594261/how-to-extract-tags-and-text-between-tags-to-a-list-with-jsoup