问题
I did not find anyone explaining this error...
I'm using lxml 3.1.0.
When there is an HTML/XML like that:
<h1 class="fn"><strong class="brand">Lange</strong> XT 100 LV Ski Boots 2014</h1>
the _ElementStringResult of string " XT 100 LV Ski Boots 2014" will be returned when we run:
>> elemstr = tree.xpath('//body//h1/text()')[0]
However, when we run as follows, we would get...
>> parent = elemstr.getparent()
>> tree.getpath(parent)
/html/body/therestofthepath/h1/strong
Did anyone have a problem like that? Is there any other way that manual check if the text is the same, and otherwise checking with the text child of the parent?
回答1:
I think this is the correct behaviour for element-tree (ET). The reason stems from the way ET represents text nodes: Only a text-node which is the first of the children of an element is represented by the attribute text.
Other intermingled text-nodes are the tail of their preceding sibling, in this case the strong-element.
import lxml.etree
xml = """<h1 class="fn"><strong class="brand">Lange</strong> XT 100 LV Ski Boots 2014</h1>"""
tree = lxml.etree.fromstring(xml)
elemstr = tree.xpath('//h1/text()')[0]
print elemstr.getparent().tail
来源:https://stackoverflow.com/questions/24570796/lxmletree-elementstringresult-getparent-works-incorrectly