How to remove xml header in beautifulsoup?

安稳与你 提交于 2021-02-11 10:36:10

问题


I have imported and modified some xml, but when I write out my xml using test.prettify(). It changes the top line of the xml from

<?xml version="1.0"?>

to

<?xml version="1.0" encoding="utf-8"?>

I don't want this change. How can I just keep the first line unchanged? What is the easiest way to do this?

If it matters, I'm using the xml parser.

soup = BeautifulSoup(r.text,'xml')

回答1:


I'm sure there's a more elegant way to do this using BeautifulSoup's built-ins, but based on your comment, I'll give you the "strip it out" version:

xml_string = '<?xml version="1.0" encoding="utf-8"?>'
print xml_string[:xml_string.find("encoding")-1] + "?>"

This is general enough to strip out any encoding from the header (not just utf-8).




回答2:


You could find the xml and use replaceWith() to replace it with the value you want.



来源:https://stackoverflow.com/questions/36503875/how-to-remove-xml-header-in-beautifulsoup

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!