elementtree

Parsing compressed xml feed into ElementTree

試著忘記壹切 提交于 2019-11-26 14:47:07
问题 I'm trying to parse the following feed into ElementTree in python: "http://smarkets.s3.amazonaws.com/oddsfeed.xml" (warning large file) Here is what I have tried so far: feed = urllib.urlopen("http://smarkets.s3.amazonaws.com/oddsfeed.xml") # feed is compressed compressed_data = feed.read() import StringIO compressedstream = StringIO.StringIO(compressed_data) import gzip gzipper = gzip.GzipFile(fileobj=compressedstream) data = gzipper.read() # Parse XML tree = ET.parse(data) but it seems to

Empty list returned from ElementTree findall

荒凉一梦 提交于 2019-11-26 13:59:28
问题 I'm new to xml parsing and Python so bear with me. I'm using lxml to parse a wiki dump, but I just want for each page, its title and text. For now I've got this: from xml.etree import ElementTree as etree def parser(file_name): document = etree.parse(file_name) titles = document.findall('.//title') print titles At the moment titles isn't returning anything. I've looked at previous answers like this one: ElementTree findall() returning empty list and the lxml documentation, but most things

How to create <!DOCTYPE> with Python&#39;s cElementTree

落爺英雄遲暮 提交于 2019-11-26 12:48:26
问题 I have tried to use the answer in this question, but can\'t make it work: How to create "virtual root" with Python's ElementTree? Here\'s my code: import xml.etree.cElementTree as ElementTree from StringIO import StringIO s = \'<?xml version=\\\"1.0\\\" encoding=\\\"UTF-8\\\" ?><!DOCTYPE tmx SYSTEM \\\"tmx14a.dtd\\\" ><tmx version=\\\"1.4a\\\" />\' tree = ElementTree.parse(StringIO(s)).getroot() header = ElementTree.SubElement(tree,\'header\',{\'adminlang\': \'EN\',}) body = ElementTree

How to write XML declaration using xml.etree.ElementTree

泪湿孤枕 提交于 2019-11-26 12:43:11
问题 I am generating an XML document in Python using an ElementTree, but the tostring function doesn\'t include an XML declaration when converting to plaintext. from xml.etree.ElementTree import Element, tostring document = Element(\'outer\') node = SubElement(document, \'inner\') node.NewValue = 1 print tostring(document) # Outputs \"<outer><inner /></outer>\" I need my string to include the following XML declaration: <?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\" ?> However, there

Using XPath in ElementTree

拜拜、爱过 提交于 2019-11-26 12:39:41
问题 My XML file looks like the following: <?xml version=\"1.0\"?> <ItemSearchResponse xmlns=\"http://webservices.amazon.com/AWSECommerceService/2008-08-19\"> <Items> <Item> <ItemAttributes> <ListPrice> <Amount>2260</Amount> </ListPrice> </ItemAttributes> <Offers> <Offer> <OfferListing> <Price> <Amount>1853</Amount> </Price> </OfferListing> </Offer> </Offers> </Item> </Items> </ItemSearchResponse> All I want to do is extract the ListPrice. This is the code I am using: >> from elementtree import

lxml etree xmlparser remove unwanted namespace

无人久伴 提交于 2019-11-26 12:23:17
问题 I have an xml doc that I am trying to parse using Etree.lxml <Envelope xmlns=\"http://www.example.com/zzz/yyy\"> <Header> <Version>1</Version> </Header> <Body> some stuff <Body> <Envelope> My code is: path = \"path to xml file\" from lxml import etree as ET parser = ET.XMLParser(ns_clean=True) dom = ET.parse(path, parser) dom.getroot() When I try to get dom.getroot() I get: <Element {http://www.example.com/zzz/yyy}Envelope at 28adacac> However I only want: <Element Envelope at 28adacac> When

access ElementTree node parent node

亡梦爱人 提交于 2019-11-26 10:32:24
I am using the builtin Python ElementTree module. It is straightforward to access children, but what about parent or sibling nodes? - can this be done efficiently without traversing the entire tree? There's no direct support in the form of a parent attribute, but you can perhaps use the patterns described here to achieve the desired effect. The following one-liner is suggested (from the linked-to post) to create a child-to-parent mapping for a whole tree: parent_map = dict((c, p) for p in tree.getiterator() for c in p) supergra Vinay's answer should still work, but for Python 2.7+ and 3.2+ the

parsing xml containing default namespace to get an element value using lxml

谁说胖子不能爱 提交于 2019-11-26 10:00:02
问题 I have a xml string like this str1 = \"\"\"<sitemapindex xmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\"> <sitemap> <loc> http://www.example.org/sitemap_1.xml.gz </loc> <lastmod>2015-07-01</lastmod> </sitemap> </sitemapindex> \"\"\" I want to extract all the urls present inside <loc> node i.e http://www.example.org/sitemap_1.xml.gz I tried this code but it didn\'t word from lxml import etree root = etree.fromstring(str1) urls = root.xpath(\"//loc/text()\") print urls [] I tried to check

How do I get Python&#39;s ElementTree to pretty print to an XML file?

一曲冷凌霜 提交于 2019-11-26 09:44:28
问题 Background I am using SQLite to access a database and retrieve the desired information. I\'m using ElementTree in Python version 2.6 to create an XML file with that information. Code import sqlite3 import xml.etree.ElementTree as ET # NOTE: Omitted code where I acccess the database, # pull data, and add elements to the tree tree = ET.ElementTree(root) # Pretty printing to Python shell for testing purposes from xml.dom import minidom print minidom.parseString(ET.tostring(root)).toprettyxml

Remove whitespaces in XML string

走远了吗. 提交于 2019-11-26 09:37:24
问题 How can I remove the whitespaces and line breaks in an XML string in Python 2.6? I tried the following packages: etree: This snippet keeps the original whitespaces: xmlStr = \'\'\'<root> <head></head> <content></content> </root>\'\'\' xmlElement = xml.etree.ElementTree.XML(xmlStr) xmlStr = xml.etree.ElementTree.tostring(xmlElement, \'UTF-8\') print xmlStr I can not use Python 2.7 which would provide the method parameter. minidom: just the same: xmlDocument = xml.dom.minidom.parseString(xmlStr