问题
I want to get a list of values from an HTML document. I am using HTMLUnit.
There are many span elements with the class topic. I want to extract the content within the span tags:
<span class="topic">
<a href="http://website.com/page/2342" class="id-24223 topic-link J_onClick topic-info-hover">Lean Startup</a>
</span>
My code looks like this:
List<?> topics = (List)page.getByXPath("//span[@class='topic']/text()");
However whenever I try to iterate over the list I get a NoSuchElementException
. Can anyone see an obvious mistake? Also links to good tutorials would be appreciated.
回答1:
If you know you'll always have an <a>
then just add it to the XPath and then get the text()
from the a
.
If you don't really know if you always will have an a
in there then I'd recommend to use the .asText()
method that all HtmlElement
and their descendants have.
So first get each of the spans:
List<?> topics = (List)page.getByXPath("//span[@class='topic']");
And then, in the loop, get the text inside each of the spans:
topic.asText();
回答2:
text()
will only extract the text from that element, and that example you've given has no text component, only a child element.
Try this instead:
List<?> topics = (List)page.getByXPath("//span[@class='topic']");
来源:https://stackoverflow.com/questions/17091091/get-content-of-list-of-span-elements-with-htmlunit-and-xpath