问题
i have the following function, which doe a basic job of mapping an lxml object to a dictionary...
from lxml import etree
tree = etree.parse('file.xml')
root = tree.getroot()
def xml_to_dict(el):
d={}
if el.text:
print '***write tag as string'
d[el.tag] = el.text
else:
d[el.tag] = {}
children = el.getchildren()
if children:
d[el.tag] = map(xml_to_dict, children)
return d
v = xml_to_dict(root)
at the moment it gives me....
>>>print v
{'root': [{'a': '1'}, {'a': [{'b': '2'}, {'b': '2'}]}, {'aa': '1a'}]}
but i would like....
>>>print v
{'root': {'a': ['1', {'b': [2, 2]}], 'aa': '1a'}}
how do i rewrite the function xml_to_dict(el) so that i get the required output?
here's the xml i'm parsing, for clarity.
<root>
<a>1</a>
<a>
<b>2</b>
<b>2</b>
</a>
<aa>1a</aa>
</root>
thanks :)
回答1:
Well, map()
will always return a list, so the easy answer is "don't use map()
". Instead, build a dictionary like you already are, by looping over children
and assigning the result of xml_to_dict(child)
to the dictionary key you want to use. It looks like you want to use the tag as the key and have the value be a list of items with that tag, so it would become something like:
import collections
from lxml import etree
tree = etree.parse('file.xml')
root = tree.getroot()
def xml_to_dict(el):
d={}
if el.text:
print '***write tag as string'
d[el.tag] = el.text
child_dicts = collections.defaultdict(list)
for child in el.getchildren():
child_dicts[child.tag].append(xml_to_dict(child))
if child_dicts:
d[el.tag] = child_dicts
return d
xml_to_dict(root)
This leaves the tag entry in the dict as a defaultdict; if you want a normal dict for some reason, use d[el.tag] = dict(child_dicts)
. Note that, like before, if a tag has both text and children the text won't appear in the dict. You may want to think about a different layout for your dict to cope with that.
EDIT:
Code that would produce the output in your rephrased question wouldn't recurse in xml_to_dict
-- because you only want a dict for the outer element, not for all child tags. So, you'd use something like:
import collections
from lxml import etree
tree = etree.parse('file.xml')
root = tree.getroot()
def xml_to_item(el):
if el.text:
print '***write tag as string'
item = el.text
child_dicts = collections.defaultdict(list)
for child in el.getchildren():
child_dicts[child.tag].append(xml_to_item(child))
return dict(child_dicts) or item
def xml_to_dict(el):
return {el.tag: xml_to_item(el)}
print xml_to_dict(root)
This still doesn't handle tags with both text and children sanely, and it turns the collections.defaultdict(list)
into a normal dict so the output is (almost) as you expect:
***write tag as string
***write tag as string
***write tag as string
***write tag as string
***write tag as string
***write tag as string
{'root': {'a': ['1', {'b': ['2', '2']}], 'aa': ['1a']}}
(If you really want integers instead of strings for the text data in the b
tags, you'll have to explicitly turn them into integers somehow.)
回答2:
Simpler:
from lxml import etree
def recursive_dict(element):
return element.tag, dict(map(recursive_dict, element)) or element.text
To use it:
>> tree = etree.parse(file_name)
>> recursive_dict(tree.getroot())
('root', {'tag1': text, 'tag2': subtag21: {tag211: text}})
来源:https://stackoverflow.com/questions/4112787/how-do-i-map-to-a-dictionary-rather-than-a-list