Python pretty print an XML given an XML string

前端 未结 3 1441
予麋鹿
予麋鹿 2020-12-17 17:56

I generated a long and ugly XML string with Python and I need to filter it through pretty printer to look nicer.

I found this post for python pretty printers, but I

相关标签:
3条回答
  • 2020-12-17 18:32

    Here's how to parse from a text string to the lxml structured data type.

    Python 2:

    from lxml import etree
    xml_str = "<parent><child>text</child><child>other text</child></parent>"
    root = etree.fromstring(xml_str)
    print etree.tostring(root, pretty_print=True)
    

    Python 3:

    from lxml import etree
    xml_str = "<parent><child>text</child><child>other text</child></parent>"
    root = etree.fromstring(xml_str)
    print(etree.tostring(root, pretty_print=True).decode())
    

    Outputs:

    <parent>
      <child>text</child>
      <child>other text</child>
    </parent>
    
    0 讨论(0)
  • 2020-12-17 18:35

    Here's a Python3 solution that gets rid of the ugly newline issue (tons of whitespace), and it only uses standard libraries unlike most other implementations. You mention that you have an xml string already so I am going to assume you used xml.dom.minidom.parseString()

    With the following solution you can avoid writing to a file first:

    import xml.dom.minidom
    import os
    
    def pretty_print_xml_given_string(input_string, output_xml):
        """
        Useful for when you are editing xml data on the fly
        """
        xml_string = input_string.toprettyxml()
        xml_string = os.linesep.join([s for s in xml_string.splitlines() if s.strip()]) # remove the weird newline issue
        with open(output_xml, "w") as file_out:
            file_out.write(xml_string)
    

    I found how to fix the common newline issue here.

    0 讨论(0)
  • 2020-12-17 18:56

    I use the lxml library, and there it's as simple as

    >>> print(etree.tostring(root, pretty_print=True))
    

    You can do that operation using any etree, which you can either generate programmatically, or read from a file.

    If you're using the DOM from PyXML, it's

    import xml.dom.ext
    xml.dom.ext.PrettyPrint(doc)
    

    That prints to the standard output, unless you specify an alternate stream.

    http://pyxml.sourceforge.net/topics/howto/node19.html

    To directly use the minidom, you want to use the toprettyxml() function.

    http://docs.python.org/library/xml.dom.minidom.html#xml.dom.minidom.Node.toprettyxml

    0 讨论(0)
提交回复
热议问题