问题
I am using XPath to query HTML sites, which works pretty good so far, but now I hit a (brick)wall and can't find a solution :-)
The html looks like this:
<ul>
<li><a href="">Text1<span>AnotherText1</span></a></li>
<li><a href="">Text2<span>AnotherText2</span></a></li>
<li><a href="">Text3<span>AnotherText3</span></a></li>
</ul>
I want to select the "TextX" part, but NOT the AnotherTextX part in the <span></span>
So far I couldn't come up with any (pure) XPath solution to do that (and in my setup I unfortunately need a pure XPath solution.
This selects kind of what I want, but it results in "TextXAnotherTextX" and I only need "TextX".
/ul/li/a
Any hints? :-)
回答1:
This gets you the first direct text node child of <a>
:
/ul/li/a/text()[1]
and this would get you any direct text node child (separately):
/ul/li/a/text()
Both of the above return "TextX"
, but if you had:
<li><a href="">Text4<span>AnotherText3</span>TrailingText</a></li>
then the latter would return: ["Text4", "TrailingText"]
, while the former would return "Text4"
only.
Your expression /ul/li/a
gets the string value of <a>
, which is defined as the concatenation of the string value of all the children of <a>
, so you get "TextXAnotherTextX"
.
来源:https://stackoverflow.com/questions/1458459/xpath-exclude-tag-in-innerhtml-a-href-innerhtmlspanexcludeme-span