elementtree

Python: Convert XML to CSV file

匆匆过客 提交于 2019-11-29 09:39:14
问题 I have an XML file like this: <hierachy> <att> <Order>1</Order> <attval>Data</attval> <children> <att> <Order>1</Order> <attval>Studyval</attval> </att> <att> <Order>2</Order> <attval>Site</attval> </att> </children> </att> <att> <Order>2</Order> <attval>Info</attval> <children> <att> <Order>1</Order> <attval>age</attval> </att> <att> <Order>2</Order> <attval>gender</attval> </att> </children> </att> </hierachy> I'm trying to convert it to a CSV file like this: Data,Studyval Date,Site Info

How to parse HTML with entities such as   using builtin library ElementTree in Python 2 & Python 3?

一个人想着一个人 提交于 2019-11-29 08:55:05
There are times that you want to parse some reasonably well-formed HTML pages, but you are reluctant to introduce extra library dependency such as BeautifulSoup or lxml. So you will probably like to try the builtin ElementTree first, because it is a standard library, it is fast (implemented in C), and it supports much better interface (such as XPATH support) than the basic HTMLParser. Not to mention, HTMLParser has its own limitations . ElementTree will work, until it encounters some entities, such as   , which are not handled by default. import xml.etree.ElementTree as ET html = '''<html>

Cannot write XML file with default namespace [duplicate]

纵然是瞬间 提交于 2019-11-29 06:33:06
This question already has an answer here: Saving XML files using ElementTree 4 answers I'm writing a Python script to update Visual Studio project files. They look like this: <?xml version="1.0" encoding="utf-8"?> <Project ToolsVersion="4.0" DefaultTargets="Build" xmlns="http://schemas.microsoft.com/developer/msbuild/2003"> <PropertyGroup> ... The following code reads and then writes the file: import xml.etree.ElementTree as ET tree = ET.parse(projectFile) root = tree.getroot() tree.write(projectFile, xml_declaration = True, encoding = 'utf-8', method = 'xml', default_namespace = "http:/

How can one replace an element with text in lxml?

老子叫甜甜 提交于 2019-11-29 03:53:32
It's easy to completely remove a given element from an XML document with lxml's implementation of the ElementTree API, but I can't see an easy way of consistently replacing an element with some text. For example, given the following input: input = '''<everything> <m>Some text before <r/></m> <m><r/> and some text after.</m> <m><r/></m> <m>Text before <r/> and after</m> <m><b/> Text after a sibling <r/> Text before a sibling<b/></m> </everything> ''' ... you could easily remove every <r> element with: from lxml import etree f = etree.fromstring(data) for r in f.xpath('//r'): r.getparent()

Python 2.5.4 - ImportError: No module named etree.ElementTree

强颜欢笑 提交于 2019-11-29 02:05:18
I'm running Python 2.5.4 on Windows and I keep getting an error when trying to import the ElementTree or cElementTree modules. The code is very simple (I'm following a tutorial): import xml.etree.ElementTree as xml root = xml.Element('root') child = xml.Element('child') root.append(child) child.attrib['name'] = "Charlie" file = open("test.xml", 'w') xml.ElementTree(root).write(file) file.close() I get the error message when I run it from the cmd or but not when I directly try it from the Python interpreter. Traceback (most recent call last): File "C:\xml.py", line 31, in <module> import xml

Suppressing namespace prefixes in ElementTree 1.2

对着背影说爱祢 提交于 2019-11-29 02:02:32
问题 In python 2.7 (with etree 1.3), I can suppress the XML prefixes on elements like this: Python 2.7.1 (r271:86832, Jun 16 2011, 16:59:05) [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import xml.etree.ElementTree as etree >>> etree.VERSION '1.3.0' >>> something = etree.Element('{http://some.namespace}token') >>> etree.tostring(something) '<ns0:token xmlns:ns0="http://some.namespace" />' >

Faithfully Preserve Comments in Parsed XML (Python 2.7)

流过昼夜 提交于 2019-11-29 01:38:28
I'd like to preserve comments as faithfully as possible while manipulating XML. I managed to preserve comments, but the contents are getting XML-escaped. #!/usr/bin/env python # add_host_to_tomcat.py import xml.etree.ElementTree as ET from CommentedTreeBuilder import CommentedTreeBuilder parser = CommentedTreeBuilder() if __name__ == '__main__': filename = "/opt/lucee/tomcat/conf/server.xml" # this is the important part: use the comment-preserving parser tree = ET.parse(filename, parser) # get the node to add a child to engine_node = tree.find("./Service/Engine") # add a node: Engine.Host host

Python and ElementTree: return “inner XML” excluding parent element

心不动则不痛 提交于 2019-11-28 23:37:30
In Python 2.6 using ElementTree, what's a good way to fetch the XML (as a string) inside a particular element, like what you can do in HTML and javascript with innerHTML ? Here's a simplified sample of the XML node I am starting with: <label attr="foo" attr2="bar">This is some text <a href="foo.htm">and a link</a> in embedded HTML</label> I'd like to end up with this string: This is some text <a href="foo.htm">and a link</a> in embedded HTML I've tried iterating over the parent node and concatenating the tostring() of the children, but that gave me only the subnodes: # returns only subnodes (e

How can I check the existence of attributes and tags in XML before parsing?

时光怂恿深爱的人放手 提交于 2019-11-28 20:16:24
I'm parsing an XML file via Element Tree in python and and writing the content to a cpp file. The content of children tags will be variant for different tags. For example first event tag has party tag as child but second event tag doesn't have. -->How can I check whether a tag exists or not before parsing? -->Children has value attribute in 1st event tag but not in second. How can I check whether an attribute exists or not before taking it's value. --> Currently my code throws an error for non existing party tag and sets a "None" attribute value for the second children tag. <main> <event>

How to find XML Elements via XPath in Python in a namespace-agnostic way?

送分小仙女□ 提交于 2019-11-28 18:48:54
since I had this annoying issue for the 2nd time, I thought that asking would help. Sometimes I have to get Elements from XML documents, but the ways to do this are awkward. I’d like to know a python library that does what I want, a elegant way to formulate my XPaths, a way to register the namespaces in prefixes automatically or a hidden preference in the builtin XML implementations or in lxml to strip namespaces completely. Clarification follows unless you already know what I want :) Example-doc: <root xmlns="http://really-long-namespace.uri" xmlns:other="http://with-ambivalent.end/#"> <other