Maintaining the indentation of an XML file when parsed with Beautifulsoup

こ雲淡風輕ζ 提交于 2021-01-28 03:32:30

问题


I am using BS4 to parse an XML file and trying to write it back to a new XML file.

Input file:

<tag1>
  <tag2 attr1="a1"> example text </tag2>
  <tag3>
    <tag4 attr2="a2"> example text </tag4>
    <tag5>
      <tag6 attr3="a3"> example text </tag6>
    </tag5>
  </tag3>
</tag1>

Script:

soup = BeautifulSoup(open("input.xml"), "xml")
f = open("output.xml", "w") 
f.write(soup.encode(formatter='minimal'))
f.close()

Output:

<tag1>
<tag2 attr1="a1"> example text </tag2>
<tag3>
<tag4 attr2="a2"> example text </tag4>
<tag5>
<tag6 attr3="a3"> example text </tag6>
</tag5>
</tag3>
</tag1>

I want to retain the indentation of the input file. I tried using prettify option.

Output-Prettify:

<tag1>
  <tag2 attr1="a1"> 
    example text 
  </tag2>
  <tag3>
    <tag4 attr2="a2"> 
      example text 
    </tag4>
    <tag5>
      <tag6 attr3="a3"> 
        example text 
      </tag6>
    </tag5>
   </tag3>
</tag1>

But this is not what I wanted. I want to maintain the exact indentation of the tags as in the input file.


回答1:


Unfortunately you cannot to it directly. Beautiful soup parses its input and keeps no trace of the original formatting.

So, if do do not modify the XML, you could first read it as a whole string in memory, then feed that string into BS to parse it and make your tests, and then use it to write back to the new file.

If you want to modify the XML and use a special formatting, you will have to navigate the BS tree and format it by hand.



来源:https://stackoverflow.com/questions/29827087/maintaining-the-indentation-of-an-xml-file-when-parsed-with-beautifulsoup

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!