elementtree

Using ElementTree to parse an XML string with a namespace

我与影子孤独终老i 提交于 2019-12-06 03:10:01
I have Googled my pants off to no avail. What I am trying to do is very simple: I'd like to access the UniqueID value in the following XML contained in a string using ElementTree. from xml.etree.ElementTree import fromstring xml_string = """<ListObjectsResponse xmlns='http://www.example.com/dir/'> <Item> <UniqueID>abcdefghijklmnopqrstuvwxyz0123456789</UniqueID> </Item> </ListObjectsResponse>""" NS = "http://www.example.com/dir/" tree = fromstring(xml_string) I know that I should use the fromstring method to parse the XML string, but I can't seem to identify how to access the UniqueID. I'm not

Converting an xml doc into a specific dot-expanded json structure

爱⌒轻易说出口 提交于 2019-12-06 02:19:43
I have the following XML document: <Item ID="288917"> <Main> <Platform>iTunes</Platform> <PlatformID>353736518</PlatformID> </Main> <Genres> <Genre FacebookID="6003161475030">Comedy</Genre> <Genre FacebookID="6003172932634">TV-Show</Genre> </Genres> <Products> <Product Country="CA"> <URL>https://itunes.apple.com/ca/tv-season/id353187108?i=353736518</URL> <Offers> <Offer Type="HDBUY"> <Price>3.49</Price> <Currency>CAD</Currency> </Offer> <Offer Type="SDBUY"> <Price>2.49</Price> <Currency>CAD</Currency> </Offer> </Offers> </Product> <Product Country="FR"> <URL>https://itunes.apple.com/fr/tv

Again: UnicodeEncodeError: ascii codec can't encode

一曲冷凌霜 提交于 2019-12-06 01:38:41
I have a folder of XML files that I would like to parse. I need to get text out of the elements of these files. They will be collected and printed to a CSV file where the elements are listed in columns. I can actually do this right now for some of my files. That is, for many of my XML files, the process goes fine, and I get the output I want. The code that does this is: import os, re, csv, string, operator import xml.etree.cElementTree as ET import codecs def parseEO(doc): #getting the basic structure tree = ET.ElementTree(file=doc) root = tree.getroot() agencycodes = [] rins = [] titles =[]

Python XpathEvaluator without namespace

北城余情 提交于 2019-12-06 00:47:26
I need to write a dynamic function that finds elements on a subtree of an ATOM xml document. To do so, I've written something like this: tree = etree.parse(xmlFileUrl) e = etree.XPathEvaluator(tree, namespaces={'def':'http://www.w3.org/2005/Atom'}) entries = e('//def:entry') for entry in entries: mypath = tree.getpath(entry) + "/category" category = e(mypath) The code above fails to find category. The reason is that getpath returns an XPath without namespaces, whereas the XPathEvaluator e() requires namespaces. Is there a way to either make getpath return namespaces in the path, or allow

can xml.etree.ElementTree.write() integer values for a given Element?

人走茶凉 提交于 2019-12-06 00:34:41
问题 at the risk of getting yelled at for asking such a simple question, but I have been trawling the internet for answers and this particular case seems to be widely avoided and the docs are ambiguous: Is it possible to use xml.etree.ElementTree.write() to write non-string values in an element's attribute? I always get: TypeError: cannot serialize 0 (type int) when I try something like this: root = ET.Element('Tasks') d = {'priority': 1, 'status': 0, 'name': 'new task', 'index': 0} d = ET

i have an error when executing “from lxml import etree” in the python command line after successfully installed lxml by pip

て烟熏妆下的殇ゞ 提交于 2019-12-05 21:39:09
bash-3.2$ pip install lxml-2.3.5.tgz Unpacking ./lxml-2.3.5.tgz Running setup.py egg_info for package from file:///Users/apple/workspace/pythonhome/misc/lxml-2.3.5.tgz Building lxml version 2.3.5. Building with Cython 0.17. Using build configuration of libxslt 1.1.27 Building against libxml2/libxslt in the following directory: /usr/local/lib warning: no previously-included files found matching '*.py' Installing collected packages: lxml Running setup.py install for lxml Building lxml version 2.3.5. Building with Cython 0.17. Using build configuration of libxslt 1.1.27 Building against libxml2

Finding top-level xml comments using Python's ElementTree

巧了我就是萌 提交于 2019-12-05 21:00:56
I'm parsing an xml file using Python's ElementTree, like that: et = ElementTree(file=file("test.xml")) test.xml starts with a few lines of xml comments. Is there a way to get those comments from et? For ElementTree 1.2.X there is an article on Reading processing instructions and comments with ElementTree ( http://effbot.org/zone/element-pi.htm ). EDIT: The alternative would be using lxml.etree which implements the ElementTree API. A quote from ElementTree compatibility of lxml.etree : ElementTree ignores comments and processing instructions when parsing XML, while etree will read them in and

UnicodeEncodeError: 'ascii' codec can't encode characters

此生再无相见时 提交于 2019-12-05 15:11:13
问题 I have a dict that's feed with url response. Like: >>> d { 0: {'data': u'<p>found "\u62c9\u67cf \u591a\u516c \u56ed"</p>'} 1: {'data': u'<p>some other data</p>'} ... } While using xml.etree.ElementTree function on this data values ( d[0]['data'] ) I get the most famous error message: UnicodeEncodeError: 'ascii' codec can't encode characters... What should I do to this Unicode string to make it suitable for ElementTree parser? PS. Please don't send me links with Unicode & Python explanation. I

Python 2.6.1 : expected path separator ([)

折月煮酒 提交于 2019-12-05 13:54:32
问题 I am getting a path separator error in python 2.6.1. I have not found this issue with python 2.7.2 version, but unfortunately I need this in 2.6.1 only. Is there any another way to achieve the same? :( my code :- import xml.etree.ElementTree as ET #version 1.2.6 import sys class usersDetail(object): def __init__(self, users=None): self.doc = ET.parse("test.xml") self.root = self.doc.getroot() def final_xml(self,username): r = self.root.find("user[@username='user1']") #not working in 2.6.1 :(

Iterating multiple (parent,child) nodes using Python ElementTree

孤者浪人 提交于 2019-12-05 12:22:05
The standard implementation of ElementTree for Python (2.6) does not provide pointers to parents from child nodes. Therefore, if parents are needed, it is suggested to loop over parents rather than children. Consider my xml is of the form: <Content> <Para>first</Para> <Table><Para>second</Para></Table> <Para>third</Para> </Content> The following finds all "Para" nodes without considering parents: (1) paras = [p for p in page.getiterator("Para")] This (adapted from effbot) stores the parent by looping over them instead of the child nodes: (2) paras = [(c,p) for p in page.getiterator() for c in