How to match a text node then follow parent nodes using XPath

感情迁移 提交于 2019-11-29 00:40:25

问题


I'm trying to parse some HTML with XPath. Following the simplified XML example below, I want to match the string 'Text 1', then grab the contents of the relevant content node.

<doc>
    <block>
        <title>Text 1</title>
        <content>Stuff I want</content>
    </block>

    <block>
        <title>Text 2</title>
        <content>Stuff I don't want</content>
    </block>
</doc>

My Python code throws a wobbly:

>>> from lxml import etree
>>>
>>> tree = etree.XML("<doc><block><title>Text 1</title><content>Stuff 
I want</content></block><block><title>Text 2</title><content>Stuff I d
on't want</content></block></doc>")
>>>
>>> # get all titles
... tree.xpath('//title/text()')
['Text 1', 'Text 2']
>>>
>>> # match 'Text 1'
... tree.xpath('//title/text()="Text 1"')
True
>>>
>>> # Follow parent from selected nodes
... tree.xpath('//title/text()/../..//text()')
['Text 1', 'Stuff I want', 'Text 2', "Stuff I don't want"]
>>>
>>> # Follow parent from selected node
... tree.xpath('//title/text()="Text 1"/../..//text()')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "lxml.etree.pyx", line 1330, in lxml.etree._Element.xpath (src/
lxml/lxml.etree.c:14542)
  File "xpath.pxi", line 287, in lxml.etree.XPathElementEvaluator.__ca
ll__ (src/lxml/lxml.etree.c:90093)
  File "xpath.pxi", line 209, in lxml.etree._XPathEvaluatorBase._handl
e_result (src/lxml/lxml.etree.c:89446)
  File "xpath.pxi", line 194, in lxml.etree._XPathEvaluatorBase._raise
_eval_error (src/lxml/lxml.etree.c:89281)
lxml.etree.XPathEvalError: Invalid type

Is this possible in XPath? Do I need to express what I want to do in a different way?


回答1:


Do you want that?

//title[text()='Text 1']/../content/text()



回答2:


Use:

string(/*/*/title[. = 'Text 1']/following-sibling::content)

This represents at least two improvements as compared to the currently accepted solution of Johannes Weiß:

  1. The very expensive abbreviation "//" (usually causing the whole XML document to be scanned) is avoided as it should be whenever the structure of the XML document is known in advance.

  2. There is no return back to the parent (the location step "/.." is avoided)



来源:https://stackoverflow.com/questions/598722/how-to-match-a-text-node-then-follow-parent-nodes-using-xpath

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!