Problem with newlines when I use toprettyxml()

前端未结

关注

 8  872

I\'m currently using the toprettyxml() function of the xml.dom module in a Python script and I\'m having some trouble with the newlines. If don\'t

相关标签:

8条回答

既然无缘

2020-11-28 15:36
This is a pretty old question but I guess I know what the problem is:

Minidoms pretty print has a pretty straight forward method. It just adds the characters that you specified as arguments. That means, it will duplicate the characters if they already exist.

E.g. if you parse an XML file that looks like this:
```
<parent>
   <child>
      Some text
   </child>
</parent>
```
there are already newline characters and indentions within the dom. Those are taken as text nodes by minidom and are still there when you parse it it into a dom object.

If you now proceed to convert the dom object into an XML string, those text nodes will still be there. Meaning new line characters and indent tabs are still remaining. Using pretty print now, will just add more new lines and more tabs. That's why in this case not using pretty print at all or specifying newl='' will result in the wanted output.

However, you generate the dom in your script, the text nodes will not be there, therefore pretty printing with newl='\r\n' and/or addindent='\t' will turn out quite pretty.

TL;DR Indents and newlines remain from parsing and pretty print just adds more
0 讨论(0)
发布评论:

提交评论
- 加载中...
孤街浪徒

2020-11-28 15:38

Are you viewing the resulting file on Windows? If so, try using toprettyxml(newl='\r\n').

0 讨论(0)
发布评论:

提交评论
- 加载中...
北海茫月

2020-11-28 15:40

toprettyxml() is quite awful. It is not a matter of Windows and '\r\n'. Trying any string as the newlparameter shows that too many lines are being added. Not only that, but other blanks (that may cause you problems when a machine reads the xml) are also added.

Some workarounds available at
http://ronrothman.com/public/leftbraned/xml-dom-minidom-toprettyxml-and-silly-whitespace

0 讨论(0)
发布评论:

提交评论
- 加载中...
粉色の甜心

2020-11-28 15:40

toprettyxml(newl='') works for me on Windows.

0 讨论(0)
发布评论:

提交评论
- 加载中...
臣服心动

2020-11-28 15:51
I found another great solution :
```
f = open(filename, 'w')
dom_string = dom1.toprettyxml(encoding='UTF-8')
dom_string = os.linesep.join([s for s in dom_string.splitlines() if s.strip()])
f.write(dom_string)
f.close()
```
Above solution basically removes the unwanted newlines from the dom_string which are generated by toprettyxml().

Inputs taken from -> What's a quick one-liner to remove empty lines from a python string?
0 讨论(0)
发布评论:

提交评论
- 加载中...
庸人自扰

2020-11-28 15:52

If you don't mind installing new packages, try beautifulsoup. I had very good experiences with its xml prettyfier.

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页