elementtree | 易学教程

Parsing compressed xml feed into ElementTree

阅读更多关于 Parsing compressed xml feed into ElementTree

问题 I'm trying to parse the following feed into ElementTree in python: "http://smarkets.s3.amazonaws.com/oddsfeed.xml" (warning large file) Here is what I have tried so far: feed = urllib.urlopen("http://smarkets.s3.amazonaws.com/oddsfeed.xml") # feed is compressed compressed_data = feed.read() import StringIO compressedstream = StringIO.StringIO(compressed_data) import gzip gzipper = gzip.GzipFile(fileobj=compressedstream) data = gzipper.read() # Parse XML tree = ET.parse(data) but it seems to

Empty list returned from ElementTree findall

阅读更多关于 Empty list returned from ElementTree findall

问题 I'm new to xml parsing and Python so bear with me. I'm using lxml to parse a wiki dump, but I just want for each page, its title and text. For now I've got this: from xml.etree import ElementTree as etree def parser(file_name): document = etree.parse(file_name) titles = document.findall('.//title') print titles At the moment titles isn't returning anything. I've looked at previous answers like this one: ElementTree findall() returning empty list and the lxml documentation, but most things

How to create <!DOCTYPE> with Python's cElementTree

阅读更多关于 How to create with Python's cElementTree

问题 I have tried to use the answer in this question, but can\'t make it work: How to create "virtual root" with Python's ElementTree? Here\'s my code: import xml.etree.cElementTree as ElementTree from StringIO import StringIO s = \'<?xml version=\\\"1.0\\\" encoding=\\\"UTF-8\\\" ?><!DOCTYPE tmx SYSTEM \\\"tmx14a.dtd\\\" ><tmx version=\\\"1.4a\\\" />\' tree = ElementTree.parse(StringIO(s)).getroot() header = ElementTree.SubElement(tree,\'header\',{\'adminlang\': \'EN\',}) body = ElementTree

How to write XML declaration using xml.etree.ElementTree

阅读更多关于 How to write XML declaration using xml.etree.ElementTree

问题 I am generating an XML document in Python using an ElementTree, but the tostring function doesn\'t include an XML declaration when converting to plaintext. from xml.etree.ElementTree import Element, tostring document = Element(\'outer\') node = SubElement(document, \'inner\') node.NewValue = 1 print tostring(document) # Outputs \"<outer><inner /></outer>\" I need my string to include the following XML declaration: <?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\" ?> However, there

Using XPath in ElementTree

阅读更多关于 Using XPath in ElementTree

问题 My XML file looks like the following: <?xml version=\"1.0\"?> <ItemSearchResponse xmlns=\"http://webservices.amazon.com/AWSECommerceService/2008-08-19\"> <Items> <Item> <ItemAttributes> <ListPrice> <Amount>2260</Amount> </ListPrice> </ItemAttributes> <Offers> <Offer> <OfferListing> <Price> <Amount>1853</Amount> </Price> </OfferListing> </Offer> </Offers> </Item> </Items> </ItemSearchResponse> All I want to do is extract the ListPrice. This is the code I am using: >> from elementtree import

lxml etree xmlparser remove unwanted namespace

阅读更多关于 lxml etree xmlparser remove unwanted namespace

问题 I have an xml doc that I am trying to parse using Etree.lxml <Envelope xmlns=\"http://www.example.com/zzz/yyy\"> <Header> <Version>1</Version> </Header> <Body> some stuff <Body> <Envelope> My code is: path = \"path to xml file\" from lxml import etree as ET parser = ET.XMLParser(ns_clean=True) dom = ET.parse(path, parser) dom.getroot() When I try to get dom.getroot() I get: <Element {http://www.example.com/zzz/yyy}Envelope at 28adacac> However I only want: <Element Envelope at 28adacac> When

access ElementTree node parent node

阅读更多关于 access ElementTree node parent node

I am using the builtin Python ElementTree module. It is straightforward to access children, but what about parent or sibling nodes? - can this be done efficiently without traversing the entire tree? There's no direct support in the form of a parent attribute, but you can perhaps use the patterns described here to achieve the desired effect. The following one-liner is suggested (from the linked-to post) to create a child-to-parent mapping for a whole tree: parent_map = dict((c, p) for p in tree.getiterator() for c in p) supergra Vinay's answer should still work, but for Python 2.7+ and 3.2+ the

parsing xml containing default namespace to get an element value using lxml

阅读更多关于 parsing xml containing default namespace to get an element value using lxml

问题 I have a xml string like this str1 = \"\"\"<sitemapindex xmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\"> <sitemap> <loc> http://www.example.org/sitemap_1.xml.gz </loc> <lastmod>2015-07-01</lastmod> </sitemap> </sitemapindex> \"\"\" I want to extract all the urls present inside <loc> node i.e http://www.example.org/sitemap_1.xml.gz I tried this code but it didn\'t word from lxml import etree root = etree.fromstring(str1) urls = root.xpath(\"//loc/text()\") print urls [] I tried to check

How do I get Python's ElementTree to pretty print to an XML file?

阅读更多关于 How do I get Python's ElementTree to pretty print to an XML file?

问题 Background I am using SQLite to access a database and retrieve the desired information. I\'m using ElementTree in Python version 2.6 to create an XML file with that information. Code import sqlite3 import xml.etree.ElementTree as ET # NOTE: Omitted code where I acccess the database, # pull data, and add elements to the tree tree = ET.ElementTree(root) # Pretty printing to Python shell for testing purposes from xml.dom import minidom print minidom.parseString(ET.tostring(root)).toprettyxml

Remove whitespaces in XML string

阅读更多关于 Remove whitespaces in XML string

问题 How can I remove the whitespaces and line breaks in an XML string in Python 2.6? I tried the following packages: etree: This snippet keeps the original whitespaces: xmlStr = \'\'\'<root> <head></head> <content></content> </root>\'\'\' xmlElement = xml.etree.ElementTree.XML(xmlStr) xmlStr = xml.etree.ElementTree.tostring(xmlElement, \'UTF-8\') print xmlStr I can not use Python 2.7 which would provide the method parameter. minidom: just the same: xmlDocument = xml.dom.minidom.parseString(xmlStr