elementtree

Check and remove duplicated children tags in XML

倖福魔咒の 提交于 2019-12-08 02:15:32
问题 I'm parsing an XML-like file via Element Tree in python and and writing the content to a pandas dataframe. I'm currently facing the following problem: The existence of children tags will be variant for different tags. This wouldn't be a problem with the solution mentioned here. However, the complicated part is that some tags have duplicated children tags while others don't. For example first product tag has two (different) article numbers and two equal product_types (duplicate) while the

The limit of Element Tree on xpath

a 夏天 提交于 2019-12-08 02:04:17
问题 I've used Element Tree for a while and i love it because of its simplicity But I'm doubting of its implementation of x path This is the XML file <a> <b name="b1"></b> <b name="b2"><c/></b> <b name="b2"></b> <b name="b3"></b> </a> The python code import xml.etree.ElementTree as ET tree = ET.parse('test.xml') root = tree.getroot() root.findall("b[@name='b2' and c]") The program shows the error: invalid predicate But if I use root.findall("b[@name='b2']") or root.findall("b[c]") It works, 回答1:

Python 3.4 : How to do xml validation

徘徊边缘 提交于 2019-12-08 00:53:53
问题 I'm trying to do XML validation against some XSD in python. I was successful using lxml package. But the problem starts when I tried to port my code into python 3.4. I tried to install lxml for 3.4 version. Looks like my enterprise linux doesn't play very well with lxml. pip installation: pip install lxml Collecting lxml Downloading lxml-3.4.4.tar.gz (3.5MB) 100% |################################| 3.5MB 92kB/s Installing collected packages: lxml Running setup.py install for lxml Successfully

Does python's xml.etree.ElementTree support DTD?

a 夏天 提交于 2019-12-08 00:31:29
Does xml.etree.ElementTree support DTD? if it supports it, can I force ElementTree check a XML file against a dtd file, even if the XML file already has one. (internal or external). I'm not sure about xml.etree , but lxml supports DTD validation: http://lxml.de/validation.html 来源: https://stackoverflow.com/questions/5396135/does-pythons-xml-etree-elementtree-support-dtd

Parsing very large HTML file with Python (ElementTree?)

*爱你&永不变心* 提交于 2019-12-08 00:10:41
问题 I asked about using BeautifulSoup to parse a very large (270MB) HTML file and getting a memory error andwas pointed toward ElementTree as a solution. I was trying to use their event-driven parsing, documented here. Testing it with the smaller settings file worked fine: >>> settings = open('S:\\Documents\\FacebookData\\html\\settings.htm') >>> for event, element in ET.iterparse(settings, events=("start", "end")): print("%5s, %4s, %s" % (event, element.tag, element.text)) Successfully prints

Is there a key for the default namespace when creating dictionary for use with xml.etree.ElementTree.findall() in Python?

雨燕双飞 提交于 2019-12-07 23:38:31
问题 I'm trying to parse an XML document with a default namespace, i.e. the root node has an xmlns attribute. This is annoying if you want to try find certain tags in the child nodes because each tag is prefixed with the default namespace. xml.etree.ElementTree.findall() allows for a namespaces dictionary to be passed in but I can't seem to find what the default namespace is mapped to. I have tried using 'default', None, 'xmlns' with no success. One option that does seem to work is to prefix the

Finding top-level xml comments using Python's ElementTree

余生长醉 提交于 2019-12-07 17:53:41
问题 I'm parsing an xml file using Python's ElementTree, like that: et = ElementTree(file=file("test.xml")) test.xml starts with a few lines of xml comments. Is there a way to get those comments from et? 回答1: For ElementTree 1.2.X there is an article on Reading processing instructions and comments with ElementTree (http://effbot.org/zone/element-pi.htm). EDIT: The alternative would be using lxml.etree which implements the ElementTree API. A quote from ElementTree compatibility of lxml.etree :

Again: UnicodeEncodeError: ascii codec can't encode

烂漫一生 提交于 2019-12-07 16:19:38
问题 I have a folder of XML files that I would like to parse. I need to get text out of the elements of these files. They will be collected and printed to a CSV file where the elements are listed in columns. I can actually do this right now for some of my files. That is, for many of my XML files, the process goes fine, and I get the output I want. The code that does this is: import os, re, csv, string, operator import xml.etree.cElementTree as ET import codecs def parseEO(doc): #getting the basic

i have an error when executing “from lxml import etree” in the python command line after successfully installed lxml by pip

…衆ロ難τιáo~ 提交于 2019-12-07 12:18:15
问题 bash-3.2$ pip install lxml-2.3.5.tgz Unpacking ./lxml-2.3.5.tgz Running setup.py egg_info for package from file:///Users/apple/workspace/pythonhome/misc/lxml-2.3.5.tgz Building lxml version 2.3.5. Building with Cython 0.17. Using build configuration of libxslt 1.1.27 Building against libxml2/libxslt in the following directory: /usr/local/lib warning: no previously-included files found matching '*.py' Installing collected packages: lxml Running setup.py install for lxml Building lxml version 2

Forcing xml.etree to output “unused” namespaces

夙愿已清 提交于 2019-12-07 09:01:41
问题 I'm trying to create shibboleth configuration files using xml.etree in python, and I'm having problems with it omitting namespace assignments when it output the finished document. I'm pretty sure that it's the problem described in Outputting an “unused” XML namespace using ElementTree I declare them... namespaces = { 'resolver': 'urn:mace:shibboleth:2.0:resolver', 'xsi': 'http://www.w3.org/2001/XMLSchema-instance', 'pc': 'urn:mace:shibboleth:2.0:resolver:pc', 'ad': 'urn:mace:shibboleth:2.0