Generating nested lists from XML doc

独自空忆成欢 提交于 2021-02-17 06:41:27

问题


Working in python, my goal is to parse through an XML doc I made and create a nested list of lists in order to access them later and parse the feeds. The XML doc resembles the following snippet:

<?xml version="1.0'>
<sources>
    <!--Source List by Institution-->
    <sourceList source="cbc">
        <f>http://rss.cbc.ca/lineup/topstories.xml</f>
    </sourceList>
    <sourceList source="bbc">
        <f>http://feeds.bbci.co.uk/news/rss.xml</f>
        <f>http://feeds.bbci.co.uk/news/world/rss.xml</f>
        <f>http://feeds.bbci.co.uk/news/uk/rss.xml</f>
    </sourceList>
    <sourceList source="reuters">
        <f>http://feeds.reuters.com/reuters/topNews</f>
        <f>http://feeds.reuters.com/news/artsculture</f>
    </sourceList>
</sources>

I would like to have something like nested lists where the inner most list would be the content between the <f></f> tags and the list above that one would be created with the names of the sources ex. source="reuters" would be reuters. Retrieving the info from the XML doc isn't a problem and I'm doing it with elementtree with loops retrieving with node.get('source') etc. The problem is I'm having trouble generating the lists with the desired names and different lengths required from the different sources. I have tried appending but am unsure how to append to list with the names retrieved. Would a dictionary be better? What would be the best practice in this situation? And how might I make this work? If any more info is required just post a comment and I'll be sure to add it.


回答1:


From your description, a dictionary with keys according to the source name and values according to the feed lists might do the trick.

Here is one way to construct such a beast:

from lxml import etree
from pprint import pprint

news_sources = {
    source.attrib['source'] : [feed.text for feed in source.xpath('./f')]
    for source in etree.parse('x.xml').xpath('/sources/sourceList')}

pprint(news_sources)

Another sample, without lxml or xpath:

import xml.etree.ElementTree as ET
from pprint import pprint

news_sources = {
    source.attrib['source'] : [feed.text for feed in source]
    for source in ET.parse('x.xml').getroot()}

pprint(news_sources)

Finally, if you are allergic to list comprehensions:

import xml.etree.ElementTree as ET
from pprint import pprint

xml = ET.parse('x.xml')
root = xml.getroot()
news_sources = {}
for sourceList in root:
    sourceListName = sourceList.attrib['source']
    news_sources[sourceListName] = []
    for feed in sourceList:
       feedName = feed.text
       news_sources[sourceListName].append(feedName)

pprint(news_sources)


来源:https://stackoverflow.com/questions/25007042/generating-nested-lists-from-xml-doc

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!