问题
I am running the following code in Python 2.7.3 on Mac OS X 10.6.8.
import StringIO
from lxml import etree
f = open('./foo', 'r')
doc = ""
while 1:
line = f.readline()
doc += line
if line == "":
break
tree = etree.parse(StringIO.StringIO(doc), etree.HTMLParser())
r = tree.xpath('//foo')
for i in r:
for j in i.iter():
print j.tag, j.text
And the file foo contains
<foo> AAA <bar> BBB </bar> XXX </foo>
The output is
foo AAA
bar BBB
Why am I not getting the text XXX
? How do I access it?
Thanks
回答1:
Try this:
from lxml import etree
tree = etree.fromstring("<foo> AAA <bar> BBB </bar> XXX </foo>")
foos = tree.xpath('//foo')
for foo in foos:
for j in foo.iter():
print j.tag, j.text, j.tail
Output:
foo AAA None
bar BBB XXX
The tail attribute holds the text after the end tag of the element.
tail
is a peculiarity of lxml and ElementTree compared to other XML models, such as DOM. See http://infohost.nmt.edu/tcc/help/pubs/pylxml/web/etree-view.html for more information.
回答2:
You also have to take
node.tail
into account (or check for it).
来源:https://stackoverflow.com/questions/12412264/missing-some-text-when-iterating-xml-elements-in-python