问题
lxml.html.fromstring insists on wrapping up everything in a tag (p
default). From this tag tree,
<p>this is <b>the</b> good stuff<p>
I want to extract the string:
this is <b>the</b> good stuff
How do I do this?
回答1:
That's often referred to as "inner xml" rather than "inner text". This is one possible way to get inner xml of an element :
import lxml.etree as etree
import lxml.html
html = "<p>this is <b>the</b> good stuff<p>"
tree = lxml.html.fromstring(html)
node = tree.xpath("//p")[0]
result = node.text + ''.join(etree.tostring(e) for e in node)
print(result)
output :
this is <b>the</b> good stuff
来源:https://stackoverflow.com/questions/30772943/get-inner-text-from-lxml