I trouble in how do parse multiple xml file and process it as dataframe in Python

守給你的承諾、 提交于 2020-01-16 14:11:10

问题


I want parse multi xml file into dataframe. There are same xpath.

I have used element tree and os Python library.It can parse all the files, but it print out empty dataframe. However if code without multiple file, it can work properly.

mypath = r'C:\Users\testFile'
files = [path.join(mypath, f) for f in listdir(mypath) if f.endswith('.xml')]

for file in files:
    xtree = et.parse(file)
    xroot = xtree.getroot()
    df_cols=['value']
    out_xml=pd.DataFrame(columns=df_cols)
    for node in xroot.findall(r'./Group[1]/Details/Section[3]/Subreport/Group/Group[1]/Details/Section/Field'):
        name = node.attrib.get('Name')
        value = node.find('Value').text
        out_xml = out_xml.append(pd.Series([value],index=df_cols),ignore_index=True)
    df = pd.DataFrame(np.reshape(out_xml.values, (-1, 4)))

回答1:


If you need a single dataframe with all data,you need to concat each dataframe to one main dataframe

mypath = r'C:\testFile'
files = [path.join(mypath, f) for f in listdir(mypath) if f.endswith('.xml')]

mainDF = pd.DataFrame()
for file in files:
    xtree = et.parse(file)
    xroot = xtree.getroot()
    df_cols=['value']
    out_xml=pd.DataFrame(columns=df_cols)
    for node in xroot.findall(r'./Group[1]/Details/Section[3]/Subreport/Group/Group[1]/Details/Section/Field'):
        name = node.attrib.get('Name')
        value = node.find('Value').text
        out_xml = out_xml.append(pd.Series([value],index=df_cols),ignore_index=True)
    df = pd.DataFrame(np.reshape(out_xml.values, (-1, 4)))
    mainDF = pd.concat([mainDF,df])
 mainDF.to_csv("filename.csv")


来源:https://stackoverflow.com/questions/58811237/i-trouble-in-how-do-parse-multiple-xml-file-and-process-it-as-dataframe-in-pytho

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!