elementtree

Comparing XML in a unit test in Python

久未见 提交于 2019-11-28 18:13:16
I have an object that can build itself from an XML string, and write itself out to an XML string. I'd like to write a unit test to test round tripping through XML, but I'm having trouble comparing the two XML versions. Whitespace and attribute order seem to be the issues. Any suggestions for how to do this? This is in Python, and I'm using ElementTree (not that that really matters here since I'm just dealing with XML in strings at this level). Kozyarchuk First normalize 2 XML, then you can compare them. I've used the following using lxml obj1 = objectify.fromstring(expect) expect = etree

XML parsing - ElementTree vs SAX and DOM

橙三吉。 提交于 2019-11-28 15:19:10
Python has several ways to parse XML... I understand the very basics of parsing with SAX . It functions as a stream parser, with an event-driven API. I understand the DOM parser also. It reads the XML into memory and converts it to objects that can be accessed with Python. Generally speaking, it was easy to choose between the two depending on what you needed to do, memory constraints, performance, etc. (Hopefully I'm correct so far.) Since Python 2.5, we also have ElementTree . How does this compare to DOM and SAX? Which is it more similar to? Why is it better than the previous parsers?

How to insert namespace and prefixes into an XML string with Python?

笑着哭i 提交于 2019-11-28 14:04:25
Suppose I have an XML string: <A> <B foo="123"> <C>thing</C> <D>stuff</D> </B> </A> and I want to insert a namespace of the type used by XML Schema, putting a prefix in front of all the element names. <A xmlns:ns1="www.example.com"> <ns1:B foo="123"> <ns1:C>thing</ns1:C> <ns1:D>stuff</ns1:D> </ns1:B> </A> Is there a way to do this (aside from brute-force find-replace or regex) using lxml.etree or a similar library? mzjn I don't think this can be done with just ElementTree. Manipulating namespaces is sometimes surprisingly tricky. There are many questions about it here on SO. Even with the more

Parse several XML declarations in a single file by means of lxml.etree.iterparse

岁酱吖の 提交于 2019-11-28 12:39:05
I need to parse a file that contains various XML files, i.e., <xml></xml> <xml></xml> .. and so forth. While using etree.iterparse, I get the following (correct) error: lxml.etree.XMLSyntaxError: XML declaration allowed only at the start of the document Now, I can preprocess the input file and produce for each contained XML file a separate file. This might be the easiest solution. But I wonder if a proper solution for this 'problem' exists. Thanks! The sample data you've provided suggests one problem, while the question and the exception you've provided suggests another. Do you have multiple

ParseError: not well-formed (invalid token) using cElementTree

匆匆过客 提交于 2019-11-28 11:55:21
I receive xml strings from an external source that can contains unsanitized user contributed content. The following xml string gave a ParseError in cElementTree : >>> print repr(s) '<Comment>dddddddd\x08\x08\x08\x08\x08\x08_____</Comment>' >>> import xml.etree.cElementTree as ET >>> ET.XML(s) Traceback (most recent call last): File "<pyshell#4>", line 1, in <module> ET.XML(s) File "<string>", line 106, in XML ParseError: not well-formed (invalid token): line 1, column 17 Is there a way to make cElementTree not complain? It seems to complain about \x08 you will need to escape that. Edit: Or you

Keep Existing Namespaces when overwriting XML file with ElementTree and Python

↘锁芯ラ 提交于 2019-11-28 11:30:32
I have an XML file in the following format <?xml version="1.0" encoding="utf-8"?> <foo> <bar> <bat>1</bat> </bar> <a> <b xmlns="urn:schemas-microsoft-com:asm.v1"> <c>1</c> </b> </a> </foo> I want to change the value of bat to '2' and change the file to this: <?xml version="1.0" encoding="utf-8"?> <foo> <bar> <bat>2</bat> </bar> <a> <b xmlns="urn:schemas-microsoft-com:asm.v1"> <c>1</c> </b> </a> </foo> I open this file by doing this tree = ET.parse(filePath) root = tree.getroot() I then change the value of bat to '2' and save the file like this: tree.write(filePath, "utf-8", True, None, "xml")

Python element tree - extract text from element, stripping tags

对着背影说爱祢 提交于 2019-11-28 11:14:40
With ElementTree in Python, how can I extract all the text from a node, stripping any tags in that element and keeping only the text? For example, say I have the following: <tag> Some <a>example</a> text </tag> I want to return Some example text . How do I go about doing this? So far, the approaches I've taken have had fairly disastrous outcomes. Benjamin Toueg If you are running under Python 3.2+, you can use itertext . itertext creates a text iterator which loops over this element and all subelements, in document order, and returns all inner text: import xml.etree.ElementTree as ET xml = '

how to create a sub-element through variable in python 3.6.5

人走茶凉 提交于 2019-11-28 10:26:56
问题 My code is: import xml.etree.ElementTree as ET from lxml import etree var1 = '<name>This is my text</name>' page = etree.Element('first') doc = etree.ElementTree(page) second = etree.SubElement(page, 'second') second.text = var1 doc.write('a.xml', xml_declaration=True, encoding='utf-8') My output is: <?xml version='1.0' encoding='UTF-8'?> <first><second><name>This is my text</name></second></first> My Desired Output is: <?xml version='1.0' encoding='UTF-8'?> <first><second><name>This is my

How to get all sub-elements of an element tree with Python ElementTree?

蹲街弑〆低调 提交于 2019-11-28 09:13:24
I want to find a way to get all the sub-elements of an element tree like the way ElementTree.getchildren() does, since getchildren() is deprecated since Python version 2.7, I don't want to use it anymore, though I can still use it currently. Thanks. All sub-elements (descendants) of elem : all_descendants = list(elem.iter()) A more complete example: >>> import xml.etree.ElementTree as ET >>> a = ET.Element('a') >>> b = ET.SubElement(a, 'b') >>> c = ET.SubElement(a, 'c') >>> d = ET.SubElement(a, 'd') >>> e = ET.SubElement(b, 'e') >>> f = ET.SubElement(d, 'f') >>> g = ET.SubElement(d, 'g') >>>

Python running out of memory parsing XML using cElementTree.iterparse

偶尔善良 提交于 2019-11-28 09:04:12
A simplified version of my XML parsing function is here: import xml.etree.cElementTree as ET def analyze(xml): it = ET.iterparse(file(xml)) count = 0 for (ev, el) in it: count += 1 print('count: {0}'.format(count)) This causes Python to run out of memory, which doesn't make a whole lot of sense. The only thing I am actually storing is the count, an integer. Why is it doing this: See that sudden drop in memory and CPU usage at the end? That's Python crashing spectacularly. At least it gives me a MemoryError (depending on what else I am doing in the loop, it gives me more random errors, like an