发表新帖

发表新帖

How do I get the whole content between two xml tags in Python?

后端未结

关注

 5  1609

我寻月下人不归 2020-12-15 09:13

I try to get the whole content between an opening xml tag and it\'s closing counterpart.

Getting the content in straight cases like title below is easy

5条回答

醉酒成梦 (楼主)

2020-12-15 09:17
I like @Marcin's solution above, however I found that when using his 2nd option (converting a sub-node, not the root of the tree) it does not handle entities.

His code from above (modified to add an entity):
```
from lxml import etree
t = etree.XML("""

  Some testing stuff
    this & that.
""")
e = t.xpath('//text')[0]
print (e.text + ''.join(map(etree.tostring, e))).strip()
```
returns:
```
this & that.
```
with a bare/unescaped '&' character instead of a proper entity ('&').

My solution was to use to call etree.tostring at the node level (instead of on all children), then strip off the starting and ending tag using a regular expression:
```
import re
from lxml import etree
t = etree.XML("""

  Some testing stuff
    this & that.
""")

e = t.xpath('//text')[0]
xml = etree.tostring(e)
inner = re.match('<[^>]*?>(.*)]*>\s*$', xml, flags=re.DOTALL).group(1)
print inner
```
produces:
```
this & that.
```
I used re.DOTALL to ensure this works for XML containing newlines.
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...

热议问题