XPath: “Exclude” tag in “InnerHtml” (InnerHtmlexcludeme

XPath: “Exclude” tag in “InnerHtml” (<a href=“”>InnerHtml<span>excludeme</span></a>

问题

I am using XPath to query HTML sites, which works pretty good so far, but now I hit a (brick)wall and can't find a solution :-)

The html looks like this:

<ul>
<li><a href="">Text1<span>AnotherText1</span></a></li>
<li><a href="">Text2<span>AnotherText2</span></a></li>
<li><a href="">Text3<span>AnotherText3</span></a></li>
</ul>

I want to select the "TextX" part, but NOT the AnotherTextX part in the <span></span> So far I couldn't come up with any (pure) XPath solution to do that (and in my setup I unfortunately need a pure XPath solution.

This selects kind of what I want, but it results in "TextXAnotherTextX" and I only need "TextX".

/ul/li/a

Any hints? :-)

回答1:

This gets you the first direct text node child of <a>:

/ul/li/a/text()[1]

and this would get you any direct text node child (separately):

/ul/li/a/text()

Both of the above return "TextX", but if you had:

<li><a href="">Text4<span>AnotherText3</span>TrailingText</a></li>

then the latter would return: ["Text4", "TrailingText"], while the former would return "Text4" only.

Your expression /ul/li/a gets the string value of <a>, which is defined as the concatenation of the string value of all the children of <a>, so you get "TextXAnotherTextX".

来源：https://stackoverflow.com/questions/1458459/xpath-exclude-tag-in-innerhtml-a-href-innerhtmlspanexcludeme-span

标签

html

xpath

screen-scraping

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!