Get content of list of span elements with HTMLUnit and XPath

问题

I want to get a list of values from an HTML document. I am using HTMLUnit.

There are many span elements with the class topic. I want to extract the content within the span tags:

<span class="topic">
  <a href="http://website.com/page/2342" class="id-24223 topic-link J_onClick topic-info-hover">Lean Startup</a>
 </span>

My code looks like this:

    List<?> topics = (List)page.getByXPath("//span[@class='topic']/text()");

However whenever I try to iterate over the list I get a NoSuchElementException. Can anyone see an obvious mistake? Also links to good tutorials would be appreciated.

回答1:

If you know you'll always have an <a> then just add it to the XPath and then get the text() from the a.

If you don't really know if you always will have an a in there then I'd recommend to use the .asText() method that all HtmlElement and their descendants have.

So first get each of the spans:

List<?> topics = (List)page.getByXPath("//span[@class='topic']");

And then, in the loop, get the text inside each of the spans:

topic.asText();

回答2:

text() will only extract the text from that element, and that example you've given has no text component, only a child element.

Try this instead:

List<?> topics = (List)page.getByXPath("//span[@class='topic']");

来源：https://stackoverflow.com/questions/17091091/get-content-of-list-of-span-elements-with-htmlunit-and-xpath

标签

java

xpath

htmlunit

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!