lxml::etree::_ElementStringResult.getparent() works incorrectly

对着背影说爱祢 提交于 2019-12-11 16:34:15

问题


I did not find anyone explaining this error...

I'm using lxml 3.1.0.

When there is an HTML/XML like that:

<h1 class="fn"><strong class="brand">Lange</strong> XT 100 LV Ski Boots 2014</h1>

the _ElementStringResult of string " XT 100 LV Ski Boots 2014" will be returned when we run:

>> elemstr = tree.xpath('//body//h1/text()')[0]

However, when we run as follows, we would get...

>> parent = elemstr.getparent()
>> tree.getpath(parent)
/html/body/therestofthepath/h1/strong

Did anyone have a problem like that? Is there any other way that manual check if the text is the same, and otherwise checking with the text child of the parent?


回答1:


I think this is the correct behaviour for element-tree (ET). The reason stems from the way ET represents text nodes: Only a text-node which is the first of the children of an element is represented by the attribute text.

Other intermingled text-nodes are the tail of their preceding sibling, in this case the strong-element.

import lxml.etree

xml = """<h1 class="fn"><strong class="brand">Lange</strong> XT 100 LV Ski Boots 2014</h1>"""

tree = lxml.etree.fromstring(xml)
elemstr = tree.xpath('//h1/text()')[0]
print elemstr.getparent().tail


来源:https://stackoverflow.com/questions/24570796/lxmletree-elementstringresult-getparent-works-incorrectly

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!