XML parsing - ElementTree vs SAX and DOM

后端 未结 4 1274
梦毁少年i
梦毁少年i 2020-12-12 11:47

Python has several ways to parse XML...

I understand the very basics of parsing with SAX. It functions as a stream parser, with an event-driven API

4条回答
  •  感情败类
    2020-12-12 12:04

    Minimal DOM implementation:

    Link.

    Python supplies a full, W3C-standard implementation of XML DOM (xml.dom) and a minimal one, xml.dom.minidom. This latter one is simpler and smaller than the full implementation. However, from a "parsing perspective", it has all the pros and cons of the standard DOM - i.e. it loads everything in memory.

    Considering a basic XML file:

    
    
        
          A1
          T1
        
        
          A2
          T2
        
    
    

    A possible Python parser using minidom is:

    import os
    from xml.dom import minidom
    from xml.parsers.expat import ExpatError
    
    #-------- Select the XML file: --------#
    #Current file name and directory:
    curpath = os.path.dirname( os.path.realpath(__file__) )
    filename = os.path.join(curpath, "sample.xml")
    #print "Filename: %s" % (filename)
    
    #-------- Parse the XML file: --------#
    try:
        #Parse the given XML file:
        xmldoc = minidom.parse(filepath)
    except ExpatError as e:
        print "[XML] Error (line %d): %d" % (e.lineno, e.code)
        print "[XML] Offset: %d" % (e.offset)
        raise e
    except IOError as e:
        print "[IO] I/O Error %d: %s" % (e.errno, e.strerror)
        raise e
    else:
        catalog = xmldoc.documentElement
        books = catalog.getElementsByTagName("book")
    
        for book in books:
            print book.getAttribute('isdn')
            print book.getElementsByTagName('author')[0].firstChild.data
            print book.getElementsByTagName('title')[0].firstChild.data
    

    Note that xml.parsers.expat is a Python interface to the Expat non-validating XML parser (docs.python.org/2/library/pyexpat.html).

    The xml.dom package supplies also the exception class DOMException, but it is not supperted in minidom!

    The ElementTree XML API:

    Link.

    ElementTree is much easier to use and it requires less memory than XML DOM. Furthermore, a C implementation is available (xml.etree.cElementTree).

    A possible Python parser using ElementTree is:

    import os
    from xml.etree import cElementTree  # C implementation of xml.etree.ElementTree
    from xml.parsers.expat import ExpatError  # XML formatting errors
    
    #-------- Select the XML file: --------#
    #Current file name and directory:
    curpath = os.path.dirname( os.path.realpath(__file__) )
    filename = os.path.join(curpath, "sample.xml")
    #print "Filename: %s" % (filename)
    
    #-------- Parse the XML file: --------#
    try:
        #Parse the given XML file:
        tree = cElementTree.parse(filename)
    except ExpatError as e:
        print "[XML] Error (line %d): %d" % (e.lineno, e.code)
        print "[XML] Offset: %d" % (e.offset)
        raise e
    except IOError as e:
        print "[XML] I/O Error %d: %s" % (e.errno, e.strerror)
        raise e
    else:
        catalogue = tree.getroot()
    
        for book in catalogue:
            print book.attrib.get("isdn")
            print book.find('author').text
            print book.find('title').text
    

提交回复
热议问题