lxml::etree::_ElementStringResult.getparent() works incorrectly

问题

I did not find anyone explaining this error...

I'm using lxml 3.1.0.

When there is an HTML/XML like that:

<h1 class="fn"><strong class="brand">Lange</strong> XT 100 LV Ski Boots 2014</h1>

the _ElementStringResult of string " XT 100 LV Ski Boots 2014" will be returned when we run:

>> elemstr = tree.xpath('//body//h1/text()')[0]

However, when we run as follows, we would get...

>> parent = elemstr.getparent()
>> tree.getpath(parent)
/html/body/therestofthepath/h1/strong

Did anyone have a problem like that? Is there any other way that manual check if the text is the same, and otherwise checking with the text child of the parent?

回答1:

I think this is the correct behaviour for element-tree (ET). The reason stems from the way ET represents text nodes: Only a text-node which is the first of the children of an element is represented by the attribute text.

Other intermingled text-nodes are the tail of their preceding sibling, in this case the strong-element.

import lxml.etree

xml = """<h1 class="fn"><strong class="brand">Lange</strong> XT 100 LV Ski Boots 2014</h1>"""

tree = lxml.etree.fromstring(xml)
elemstr = tree.xpath('//h1/text()')[0]
print elemstr.getparent().tail

来源：https://stackoverflow.com/questions/24570796/lxmletree-elementstringresult-getparent-works-incorrectly

标签

python

lxml

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!