elementtree

Python running out of memory parsing XML using cElementTree.iterparse

喜你入骨 提交于 2019-11-27 02:37:54
问题 A simplified version of my XML parsing function is here: import xml.etree.cElementTree as ET def analyze(xml): it = ET.iterparse(file(xml)) count = 0 for (ev, el) in it: count += 1 print('count: {0}'.format(count)) This causes Python to run out of memory, which doesn't make a whole lot of sense. The only thing I am actually storing is the count, an integer. Why is it doing this: See that sudden drop in memory and CPU usage at the end? That's Python crashing spectacularly. At least it gives me

parsing xml containing default namespace to get an element value using lxml

拈花ヽ惹草 提交于 2019-11-27 02:11:32
I have a xml string like this str1 = """<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <sitemap> <loc> http://www.example.org/sitemap_1.xml.gz </loc> <lastmod>2015-07-01</lastmod> </sitemap> </sitemapindex> """ I want to extract all the urls present inside <loc> node i.e http://www.example.org/sitemap_1.xml.gz I tried this code but it didn't word from lxml import etree root = etree.fromstring(str1) urls = root.xpath("//loc/text()") print urls [] I tried to check if my root node is formed correctly. I tried this and get back the same string as str1 etree.tostring(root) '

How do I get Python's ElementTree to pretty print to an XML file?

旧城冷巷雨未停 提交于 2019-11-27 01:42:16
Background I am using SQLite to access a database and retrieve the desired information. I'm using ElementTree in Python version 2.6 to create an XML file with that information. Code import sqlite3 import xml.etree.ElementTree as ET # NOTE: Omitted code where I acccess the database, # pull data, and add elements to the tree tree = ET.ElementTree(root) # Pretty printing to Python shell for testing purposes from xml.dom import minidom print minidom.parseString(ET.tostring(root)).toprettyxml(indent = " ") ####### Here lies my problem ####### tree.write("New_Database.xml") Attempts I've tried using

Accessing XMLNS attribute with Python Elementree?

感情迁移 提交于 2019-11-27 01:36:17
问题 How can one access NS attributes through using ElementTree? With the following: <data xmlns="http://www.foo.net/a" xmlns:a="http://www.foo.net/a" book="1" category="ABS" date="2009-12-22"> When I try to root.get('xmlns') I get back None, Category and Date are fine, Any help appreciated.. 回答1: I think element.tag is what you're looking for. Note that your example is missing a trailing slash, so it's unbalanced and won't parse. I've added one in my example. >>> from xml.etree import ElementTree

Remove whitespaces in XML string

廉价感情. 提交于 2019-11-27 01:27:30
How can I remove the whitespaces and line breaks in an XML string in Python 2.6? I tried the following packages: etree: This snippet keeps the original whitespaces: xmlStr = '''<root> <head></head> <content></content> </root>''' xmlElement = xml.etree.ElementTree.XML(xmlStr) xmlStr = xml.etree.ElementTree.tostring(xmlElement, 'UTF-8') print xmlStr I can not use Python 2.7 which would provide the method parameter. minidom: just the same: xmlDocument = xml.dom.minidom.parseString(xmlStr) xmlStr = xmlDocument.toprettyxml(indent='', newl='', encoding='UTF-8') Steven The easiest solution is

Use xml.etree.elementtree to print nicely formatted xml files [duplicate]

放肆的年华 提交于 2019-11-27 00:47:01
问题 This question already has an answer here: Pretty printing XML in Python 21 answers I am trying to use xml.etree.elementtree to write out xml files with Python. The issue is that they keep getting generated in a single line. I want to be able to easily reference them so if its possible I would really like to be able to have the written out cleanly. This is what I am getting <Language><En><Port>Port</Port><UserName>UserName</UserName></En><Ch><Port>IP地址</Port><UserName>用户名称</UserName></Ch><

Emitting namespace specifications with ElementTree in Python

强颜欢笑 提交于 2019-11-27 00:28:19
问题 I am trying to emit an XML file with element-tree that contains an XML declaration and namespaces. Here is my sample code: from xml.etree import ElementTree as ET ET.register_namespace('com',"http://www.company.com") #some name # build a tree structure root = ET.Element("STUFF") body = ET.SubElement(root, "MORE_STUFF") body.text = "STUFF EVERYWHERE!" # wrap it in an ElementTree instance, and save as XML tree = ET.ElementTree(root) tree.write("page.xml", xml_declaration=True, method="xml" )

lxml etree xmlparser remove unwanted namespace

丶灬走出姿态 提交于 2019-11-27 00:20:57
I have an xml doc that I am trying to parse using Etree.lxml <Envelope xmlns="http://www.example.com/zzz/yyy"> <Header> <Version>1</Version> </Header> <Body> some stuff <Body> <Envelope> My code is: path = "path to xml file" from lxml import etree as ET parser = ET.XMLParser(ns_clean=True) dom = ET.parse(path, parser) dom.getroot() When I try to get dom.getroot() I get: <Element {http://www.example.com/zzz/yyy}Envelope at 28adacac> However I only want: <Element Envelope at 28adacac> When i do dom.getroot().find("Body") I get nothing returned. However, when I dom.getroot().find("{http://www

ElementTree iterparse strategy

杀马特。学长 韩版系。学妹 提交于 2019-11-27 00:18:12
问题 I have to handle xml documents that are big enough (up to 1GB) and parse them with python. I am using the iterparse() function (SAX style parsing). My concern is the following, imagine you have an xml like this <?xml version="1.0" encoding="UTF-8" ?> <families> <family> <name>Simpson</name> <members> <name>Homer</name> <name>Marge</name> <name>Bart</name> </members> </family> <family> <name>Griffin</name> <members> <name>Peter</name> <name>Brian</name> <name>Meg</name> </members> </family> <

Convert Python ElementTree to string

不想你离开。 提交于 2019-11-27 00:11:39
问题 Whenever I call ElementTree.tostring(e) , I get the following error message: AttributeError: 'Element' object has no attribute 'getroot' Is there any other way to convert an ElementTree object into an XML string? TraceBack: Traceback (most recent call last): File "Development/Python/REObjectSort/REObjectResolver.py", line 145, in <module> cm = integrateDataWithCsv(cm, csvm) File "Development/Python/REObjectSort/REObjectResolver.py", line 137, in integrateDataWithCsv xmlstr = ElementTree