Python strip XML tags from document

前端未结

关注

 3  524

I am trying to strip XML tags from a document using Python, a language I am a novice in. Here is my first attempt using regex, whixh was really a hope-for-the-best idea.

相关标签:

3条回答

星月不相逢

2020-12-19 01:12
An alternative to Jeremiah's answer without requiring the lxml external library:
```
import xml.etree.ElementTree as ET
...
tree = ET.fromstring(Text)
notags = ET.tostring(tree, encoding='utf8', method='text')
print(notags)
```
Should work with any Python >= 2.5
0 讨论(0)
发布评论:

提交评论
- 加载中...
时光取名叫无心

2020-12-19 01:13
Please, note, that usually it is not normal to do it by regular expressions. See Jeremiah answer.

Try this:
```
import re

text = re.sub('<[^<]+>', "", open("/path/to/file").read())
with open("/path/to/file", "w") as f:
    f.write(text)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
小蘑菇

2020-12-19 01:23
The most reliable way to do this is probably with LXML.
```
from lxml import etree
...
tree = etree.parse('somefile.xml')
notags = etree.tostring(tree, encoding='utf8', method='text')
print(notags)
```
It will avoid the problems with "parsing" XML with regular expressions, and should correctly handle escaping and everything.
0 讨论(0)
发布评论:

提交评论
- 加载中...