elementtree

Testing Equivalence of xml.etree.ElementTree

China☆狼群 提交于 2019-11-27 13:10:49
I'm interested in equivalence of two xml elements; and I've found that testing the tostring of the elements works; however, that seems hacky. Is there a better way to test equivalence of two etree Elements? Comparing Elements directly: import xml.etree.ElementTree as etree h1 = etree.Element('hat',{'color':'red'}) h2 = etree.Element('hat',{'color':'red'}) h1 == h2 # False Comparing Elements as strings: etree.tostring(h1) == etree.tostring(h2) # True This compare function works for me: def elements_equal(e1, e2): if e1.tag != e2.tag: return False if e1.text != e2.text: return False if e1.tail !

How can I check the existence of attributes and tags in XML before parsing?

南楼画角 提交于 2019-11-27 12:00:51
问题 I'm parsing an XML file via Element Tree in python and and writing the content to a cpp file. The content of children tags will be variant for different tags. For example first event tag has party tag as child but second event tag doesn't have. -->How can I check whether a tag exists or not before parsing? -->Children has value attribute in 1st event tag but not in second. How can I check whether an attribute exists or not before taking it's value. --> Currently my code throws an error for

Parsing text from XML node in Python

浪子不回头ぞ 提交于 2019-11-27 09:52:26
I'm trying to extract URLs from a sitemap like this: https://www.bestbuy.com/sitemap_c_0.xml.gz I've unzipped and saved the .xml.gz file as an .xml file. The structure looks like this: <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xhtml="http://www.w3.org/1999/xhtml" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"> <url> <loc>https://www.bestbuy.com/</loc> <priority>0.0</priority> </url> <url>

Parsing compressed xml feed into ElementTree

梦想的初衷 提交于 2019-11-27 09:36:30
I'm trying to parse the following feed into ElementTree in python: " http://smarkets.s3.amazonaws.com/oddsfeed.xml " (warning large file) Here is what I have tried so far: feed = urllib.urlopen("http://smarkets.s3.amazonaws.com/oddsfeed.xml") # feed is compressed compressed_data = feed.read() import StringIO compressedstream = StringIO.StringIO(compressed_data) import gzip gzipper = gzip.GzipFile(fileobj=compressedstream) data = gzipper.read() # Parse XML tree = ET.parse(data) but it seems to just hang on compressed_data = feed.read() , infinitely maybe?? (I know it's a big file, but seems too

XML parsing - ElementTree vs SAX and DOM

我怕爱的太早我们不能终老 提交于 2019-11-27 09:20:54
问题 Python has several ways to parse XML... I understand the very basics of parsing with SAX . It functions as a stream parser, with an event-driven API. I understand the DOM parser also. It reads the XML into memory and converts it to objects that can be accessed with Python. Generally speaking, it was easy to choose between the two depending on what you needed to do, memory constraints, performance, etc. (Hopefully I'm correct so far.) Since Python 2.5, we also have ElementTree . How does this

How to insert namespace and prefixes into an XML string with Python?

℡╲_俬逩灬. 提交于 2019-11-27 08:16:49
问题 Suppose I have an XML string: <A> <B foo="123"> <C>thing</C> <D>stuff</D> </B> </A> and I want to insert a namespace of the type used by XML Schema, putting a prefix in front of all the element names. <A xmlns:ns1="www.example.com"> <ns1:B foo="123"> <ns1:C>thing</ns1:C> <ns1:D>stuff</ns1:D> </ns1:B> </A> Is there a way to do this (aside from brute-force find-replace or regex) using lxml.etree or a similar library? 回答1: I don't think this can be done with just ElementTree. Manipulating

Empty list returned from ElementTree findall

百般思念 提交于 2019-11-27 08:05:47
I'm new to xml parsing and Python so bear with me. I'm using lxml to parse a wiki dump, but I just want for each page, its title and text. For now I've got this: from xml.etree import ElementTree as etree def parser(file_name): document = etree.parse(file_name) titles = document.findall('.//title') print titles At the moment titles isn't returning anything. I've looked at previous answers like this one: ElementTree findall() returning empty list and the lxml documentation, but most things seemed to be tailored towards parsing HTML. This is a section of my XML: <mediawiki xmlns="http://www

Parse several XML declarations in a single file by means of lxml.etree.iterparse

亡梦爱人 提交于 2019-11-27 07:05:41
问题 I need to parse a file that contains various XML files, i.e., <xml></xml> <xml></xml> .. and so forth. While using etree.iterparse, I get the following (correct) error: lxml.etree.XMLSyntaxError: XML declaration allowed only at the start of the document Now, I can preprocess the input file and produce for each contained XML file a separate file. This might be the easiest solution. But I wonder if a proper solution for this 'problem' exists. Thanks! 回答1: The sample data you've provided

Saving XML using ETree in Python. It's not retaining namespaces, and adding ns0, ns1 and removing xmlns tags

拟墨画扇 提交于 2019-11-27 06:52:00
问题 I see there are similar questions here, but nothing that has totally helped me. I've also looked at the official documentation on namespaces but can't find anything that is really helping me, perhaps I'm just too new at XML formatting. I understand that perhaps I need to create my own namespace dictionary? Either way, here is my situation: I am getting a result from an API call, it gives me an XML that is stored as a string in my Python application. What I'm trying to accomplish is just grab

ParseError: not well-formed (invalid token) using cElementTree

混江龙づ霸主 提交于 2019-11-27 06:35:52
问题 I receive xml strings from an external source that can contains unsanitized user contributed content. The following xml string gave a ParseError in cElementTree : >>> print repr(s) '<Comment>dddddddd\x08\x08\x08\x08\x08\x08_____</Comment>' >>> import xml.etree.cElementTree as ET >>> ET.XML(s) Traceback (most recent call last): File "<pyshell#4>", line 1, in <module> ET.XML(s) File "<string>", line 106, in XML ParseError: not well-formed (invalid token): line 1, column 17 Is there a way to