Read XML file to Pandas DataFrame

后端 未结 3 816
臣服心动
臣服心动 2020-12-11 20:30

Can someone please help convert the following XML file to Pandas dataframe:

相关标签:
3条回答
  • 2020-12-11 21:05

    Hello all I found another really easily way to solve those question. reference: https://www.youtube.com/watch?v=WVrg5-cjr5k

    import xml.etree.ElementTree as ET
    import pandas as pd
    import codecs
    
    ## open notebook and save your xml file to text.xml 
    with codecs.open('text.xml', 'r', encoding='utf8') as f:
        tt = f.read()
    
    
    def xml2df(xml_data):
        root = ET.XML(xml_data)
        all_records = []
        for i, child in enumerate(root):
            record = {}
            for sub_child in child:
                record[sub_child.tag] = sub_child.text
            all_records.append(record)
        return pd.DataFrame(all_records)
    
    
    df_xml1 = xml2df(tt)
    print(df_xml1)
    

    for better understanding of ET you can use underneath code to see what in side of your xml

    import xml.etree.ElementTree as ET
    import pandas as pd
    import codecs
    with codecs.open('text.xml', 'r', encoding='utf8') as f:
        tt = f.read()
    
    root = ET.XML(tt)
    
    print(type(root))
    print(root[0])
    for ele in root[0]:
        print(ele.tag + '////' + ele.text)
    
    print(root[0][0].tag)
    

    Once you finish running the program you can see the output underneath:

    C:\Users\username\Documents\pycode\Scripts\python.exe C:/Users/username/PycharmProjects/DestinationLight/try.py
          n35237      n32238     n44699
    0        1.0         3.0        nan
    1  7020000.0  10000000.0  4128000.0
    2    35237.0     32238.0    44699.0
    
    <class 'xml.etree.ElementTree.Element'>
    <Element 'bathrooms' at 0x00000285006B6180>
    n35237////1.0
    n32238////3.0
    n44699////nan
    n35237
    
    Process finished with exit code 0
    
    0 讨论(0)
  • 2020-12-11 21:13

    if the data is simple, like this, then you can do something like:

    from lxml import objectify
    xml = objectify.parse('Document1.xml')
    root = xml.getroot()
    
    bathrooms = [child.text for child in root['bathrooms'].getchildren()]
    price = [child.text for child in root['price'].getchildren()]
    property_id = [child.text for child in root['property_id'].getchildren()]
    
    data = [bathrooms, price, property_id]
    df = pd.DataFrame(data).T
    df.columns = ['bathrooms', 'price', 'property_id']
    
        bathrooms   price      property_id
    0   1.0        7020000.0    35237.0
    1   3.0        10000000.0   32238.0
    2   nan        4128000.0    44699.0
    

    if it is more complex then a loop is better. You can do something like

    from lxml import objectify
    xml = objectify.parse('Document1.xml')
    root = xml.getroot()
    
    data=[]
    for i in range(len(root.getchildren())):
        data.append([child.text for child in root.getchildren()[i].getchildren()])
    
    df = pd.DataFrame(data).T
    df.columns = ['bathrooms', 'price', 'property_id']
    
    0 讨论(0)
  • 2020-12-11 21:25

    I have had success using this function from the xmltodict package:

    import xmltodict
    
    xmlDict = xmltodict.parse(xmlData)
    df = pd.DataFrame.from_dict(xmlDict)
    

    What I like about this, is I can easily do some dictionary manipulation in between parsing the xml and making my df. Also, it helps to explore the data as a dict if the structure is wily.

    0 讨论(0)
提交回复
热议问题