Python XML Parsing [duplicate]

一世执手 提交于 2019-12-04 09:26:51

问题


*Note: lxml will not run on my system. I was hoping to find a solution that does not involve lxml.

I have gone through some of the documentation around here already, and am having difficulties getting this to work how I would like to. I would like to parse some XML file that looks like this:

<dict>
    <key>1375</key>
    <dict>
        <key>Key 1</key><integer>1375</integer>
        <key>Key 2</key><string>Some String</string>
        <key>Key 3</key><string>Another string</string>
        <key>Key 4</key><string>Yet another string</string>
        <key>Key 5</key><string>Strings anyone?</string>
    </dict>
</dict>

In the file I am trying to manipulate, there are more 'dict' that follow this one. I would like to read through the XML and output a text/dat file that would look like this:

1375, "Some String", "Another String", "Yet another string", "Strings anyone?"

...

Eof

** Originally, I tried to use lxml, but after many tries to get it working on my system, I moved on to using DOM. More recently, I tried using Etree to do this task. Please, for the love of all that is good, would somebody help me along with this? I am relatively new to Python and would like to learn how this works. I thank you in advance.


回答1:


You can use xml.etree.ElementTree which is included with Python. There is an included companion C-implemented (i.e. much faster) xml.etree.cElementTree. lxml.etree offers a superset of the functionality but it's not needed for what you want to do.

The code provided by @Acorn works identically for me (Python 2.7, Windows 7) with each of the following imports:

import xml.etree.ElementTree as et
import xml.etree.cElementTree as et
import lxml.etree as et
...
tree = et.fromstring(xmltext)
...

What OS are you using and what installation problems have you had with lxml?




回答2:


import xml.etree.ElementTree as et
import csv

xmltext = """
<dicts>
    <key>1375</key>
    <dict>
        <key>Key 1</key><integer>1375</integer>
        <key>Key 2</key><string>Some String</string>
        <key>Key 3</key><string>Another string</string>
        <key>Key 4</key><string>Yet another string</string>
        <key>Key 5</key><string>Strings anyone?</string>
    </dict>
</dicts>
"""

f = open('output.txt', 'w')

writer = csv.writer(f, quoting=csv.QUOTE_NONNUMERIC)

tree = et.fromstring(xmltext)

# iterate over the dict elements
for dict_el in tree.iterfind('dict'):
    data = []
    # get the text contents of each non-key element
    for el in dict_el:
        if el.tag == 'string':
            data.append(el.text)
        # if it's an integer element convert to int so csv wont quote it
        elif el.tag == 'integer':
            data.append(int(el.text))
    writer.writerow(data)


来源:https://stackoverflow.com/questions/7939954/python-xml-parsing

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!