parsing large xml file with Python - etree.parse error

后端 未结 2 983
长情又很酷
长情又很酷 2021-02-20 14:15

Trying to parse the following Python file using the lxml.etree.iterparse function.

\"sampleoutput.xml\"


  Item 1
         


        
相关标签:
2条回答
  • 2021-02-20 14:31

    The problem is that XML isn't well-formed if it doesn't have exactly one top-level tag. You can fix your sample by wrapping the entire document in <items></items> tags. You also need the <desc/> tags to match the query that you're using (description).

    The following document produces correct results with your existing code:

    <items>
      <item>
        <title>Item 1</title>
        <description>Description 1</description>
      </item>
      <item>
        <title>Item 2</title>
        <description>Description 2</description>
      </item>
    </items>
    
    0 讨论(0)
  • 2021-02-20 14:36

    As far as I know, xml.etree.ElementTree usually expects the XML file to contain one "root" element, i.e. one XML tag that encloses the complete document structure. From the error message you posted I would assume that this is the problem here as well:

    ´Line 5´ refers to the second <item> tag, so I guess Python complains that there is more data following after the assumed root element (i.e. the first <item> tag) was closed.

    0 讨论(0)
提交回复
热议问题