How to convert < into < in lxml, Python?

隐身守侯 提交于 2020-01-03 21:01:23

问题


There's a xml file:

<body>
    <entry>
         I go to <hw>to</hw> to school.
    </entry>
</body>

For some reason, I changed <hw> to &lt;hw&gt; and </hw> to &lt;/hw&gt; before parsing it with lxml parser.

<body>
    <entry>
         I go to &lt;hw&gt;to&lt;/hw&gt; to school.
    </entry>
</body>

But after modifying the parsed xml data, I want to get a <hw> element, not &lt;hw&gt;. How can I do that?


回答1:


First find a unescape function:

from xml.sax.saxutils import unescape

entry=body[0]

unescape and replace it with the original:

body.replace(entry, e.fromstring(unescape(e.tounicode(entry))))



回答2:


If you know which element contains wrongly escaped elements:

# parse whole document as usual..
# find the entry element..
# parse the fragment
fragment = lxml.fromstring(entry.text)
# (optionally) add the fragment to the tree
entry.text = None
entry.append(fragment)


来源:https://stackoverflow.com/questions/14659423/how-to-convert-lt-into-in-lxml-python

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!