How to convert an XML file to nice pandas dataframe?

前端 未结 4 1763
一向
一向 2020-11-22 16:30

Let\'s assume that I have an XML like this:



        
4条回答
  •  攒了一身酷
    2020-11-22 17:20

    You can easily use xml (from the Python standard library) to convert to a pandas.DataFrame. Here's what I would do (when reading from a file replace xml_data with the name of your file or file object):

    import pandas as pd
    import xml.etree.ElementTree as ET
    import io
    
    def iter_docs(author):
        author_attr = author.attrib
        for doc in author.iter('document'):
            doc_dict = author_attr.copy()
            doc_dict.update(doc.attrib)
            doc_dict['data'] = doc.text
            yield doc_dict
    
    xml_data = io.StringIO(u'''\
    
        
            
            
            
            
            
            
            
            
            
            
            
            
            
            
            
            
        
    
    ''')
    
    etree = ET.parse(xml_data) #create an ElementTree object 
    doc_df = pd.DataFrame(list(iter_docs(etree.getroot())))
    

    If there are multiple authors in your original document or the root of your XML is not an author, then I would add the following generator:

    def iter_author(etree):
        for author in etree.iter('author'):
            for row in iter_docs(author):
                yield row
    

    and change doc_df = pd.DataFrame(list(iter_docs(etree.getroot()))) to doc_df = pd.DataFrame(list(iter_author(etree)))

    Have a look at the ElementTree tutorial provided in the xml library documentation.

提交回复
热议问题