elementtree

XML header getting removed after processing with elementtree

↘锁芯ラ 提交于 2019-12-22 07:13:04
问题 i have an xml file and i used Elementtree to add a new tag to the xml file.My xml file before processing is as follows <?xml version="1.0" encoding="utf-8"?> <PackageInfo xmlns="http://someurlpackage"> <data ID="http://someurldata1">data1</data > <data ID="http://someurldata2">data2</data > <data ID="http://someurldata3">data3</data > </PackageInfo> I used following python code to add a new data tag and write it to my xml file tree = ET.ElementTree(xmlFile) root = tree.getroot() elem= ET

XML header getting removed after processing with elementtree

喜你入骨 提交于 2019-12-22 07:12:09
问题 i have an xml file and i used Elementtree to add a new tag to the xml file.My xml file before processing is as follows <?xml version="1.0" encoding="utf-8"?> <PackageInfo xmlns="http://someurlpackage"> <data ID="http://someurldata1">data1</data > <data ID="http://someurldata2">data2</data > <data ID="http://someurldata3">data3</data > </PackageInfo> I used following python code to add a new data tag and write it to my xml file tree = ET.ElementTree(xmlFile) root = tree.getroot() elem= ET

Python ElementTree won't convert non-breaking spaces when using UTF-8 for output

一世执手 提交于 2019-12-21 07:58:14
问题 I'm trying to parse, manipulate, and output HTML using Python's ElementTree: import sys from cStringIO import StringIO from xml.etree import ElementTree as ET from htmlentitydefs import entitydefs source = StringIO("""<html> <body> <p>Less than <</p> <p>Non-breaking space  </p> </body> </html>""") parser = ET.XMLParser() parser.parser.UseForeignDTD(True) parser.entity.update(entitydefs) etree = ET.ElementTree() tree = etree.parse(source, parser=parser) for p in tree.findall('.//p'): print ET

Python: Ignore xmlns in elementtree.ElementTree

早过忘川 提交于 2019-12-21 07:04:48
问题 Is there a way to ignore the XML namespace in tage names in elementtree.ElementTree ? I try to print all technicalContact tags: for item in root.getiterator(tag='{http://www.example.com}technicalContact'): print item.tag, item.text And I get something like: {http://www.example.com}technicalContact blah@example.com But what I really want is: technicalContact blah@example.com Is there a way to display only the suffix (sans xmlns), or better - iterate over the elements without explicitly stating

Getting non-contiguous text with lxml / ElementTree

邮差的信 提交于 2019-12-21 05:36:22
问题 Suppose I have this sort of HTML from which I need to select "text2" using lxml / ElementTree: <div>text1<span>childtext1</span>text2<span>childtext2</span>text3</div> If I already have the div element as mydiv, then mydiv.text returns just "text1". Using itertext() seems problematic or cumbersome at best since it walks the entire tree under the div. Is there any simple/elegant way to extract a non-first text chunk from an element? 回答1: Well, lxml.etree provides full XPath support, which

how to remove attribute of a etree Element?

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-20 10:17:38
问题 I've Element of etree having some attributes - how can we delete the attribute of perticular etree Element. 回答1: The .attrib member of the element object contains the dict of attributes - you can use .pop("key") or del like you would on any other dict to remove a key-val pair. 回答2: Example : >>> from lxml import etree >>> from lxml.builder import E >>> otree = E.div() >>> otree.set("id","123") >>> otree.set("data","321") >>> etree.tostring(otree) '<div id="123" data="321"/>' >>> del otree

Find parent element of a 'title' xml tag containing specific text using Python ElementTree

為{幸葍}努か 提交于 2019-12-20 06:22:52
问题 I wish to parse an xml file and extract the parent <sec> which contains a <title> matching a specific text using Python 3.7 & ElementTree ... <sec id="s0010"> <label>2</label> <title>Materials and methods</title> </sec> <sec id="s0015"> <label>3</label> <title>Summary</title> </sec> ... I was able to locate the title using ET: for title in parent.iter('title'): text = title.text if(text): if("methods" in text.lower()): print("**title: "+text+"****") But how do I get the parent object ( <sec>

Merging Lots of XML files

青春壹個敷衍的年華 提交于 2019-12-20 05:25:27
问题 I have lots of xml files that I need to merge. I have tried this link at merging xml files using python's ElementTree whose code is (Edited as per my need): import os, os.path, sys import glob from xml.etree import ElementTree def run(files): xml_files = glob.glob(files +"/*.xml") xml_element_tree = None for xml_file in xml_files: print xml_file data = ElementTree.parse(xml_file).getroot() # print ElementTree.tostring(data) for result in data.iter('TALLYMESSAGE'): if xml_element_tree is None:

lxml.etree insert elements into element.text

。_饼干妹妹 提交于 2019-12-20 03:38:04
问题 I have strings that have empty xml elements in them, like this: >>> s = """fizz buzz <pb n="44"/> bananas""" These strings have been assigned to xml elements using the etree.SubElement method: >>> from lxml import etree as et >>> root = et.Element('root') >>> txt = et.SubElement(root, 'text') >>> txt.text = s >>> et.dump(root) <root> <text>fizz buzz <pb n="44"/> bananas</text> </root> Fiddling about with re.split() and etree's text and tail I can insert a subelement <pb n="44"/> where I want

SyntaxError using gdata-python-client to access Google Book Search Data API

岁酱吖の 提交于 2019-12-20 02:57:11
问题 >>> import gdata.books.service >>> service = gdata.books.service.BookService() >>> results = service.search_by_keyword(isbn='0434003484') Traceback (most recent call last): File "<pyshell#4>", line 1, in <module> results = service.search_by_keyword(isbn='0434003484') ... snip ... File "C:\Python26\lib\site-packages\atom\__init__.py", line 127, in CreateClassFromXMLString tree = ElementTree.fromstring(xml_string) File "<string>", line 85, in XML SyntaxError: syntax error: line 1, column 0 This