elementtree | 易学教程

XML header getting removed after processing with elementtree

阅读更多关于 XML header getting removed after processing with elementtree

问题 i have an xml file and i used Elementtree to add a new tag to the xml file.My xml file before processing is as follows <?xml version="1.0" encoding="utf-8"?> <PackageInfo xmlns="http://someurlpackage"> <data ID="http://someurldata1">data1</data > <data ID="http://someurldata2">data2</data > <data ID="http://someurldata3">data3</data > </PackageInfo> I used following python code to add a new data tag and write it to my xml file tree = ET.ElementTree(xmlFile) root = tree.getroot() elem= ET

XML header getting removed after processing with elementtree

阅读更多关于 XML header getting removed after processing with elementtree

Python ElementTree won't convert non-breaking spaces when using UTF-8 for output

阅读更多关于 Python ElementTree won't convert non-breaking spaces when using UTF-8 for output

问题 I'm trying to parse, manipulate, and output HTML using Python's ElementTree: import sys from cStringIO import StringIO from xml.etree import ElementTree as ET from htmlentitydefs import entitydefs source = StringIO("""<html> <body> <p>Less than <</p> <p>Non-breaking space </p> </body> </html>""") parser = ET.XMLParser() parser.parser.UseForeignDTD(True) parser.entity.update(entitydefs) etree = ET.ElementTree() tree = etree.parse(source, parser=parser) for p in tree.findall('.//p'): print ET

Python: Ignore xmlns in elementtree.ElementTree

阅读更多关于 Python: Ignore xmlns in elementtree.ElementTree

问题 Is there a way to ignore the XML namespace in tage names in elementtree.ElementTree ? I try to print all technicalContact tags: for item in root.getiterator(tag='{http://www.example.com}technicalContact'): print item.tag, item.text And I get something like: {http://www.example.com}technicalContact blah@example.com But what I really want is: technicalContact blah@example.com Is there a way to display only the suffix (sans xmlns), or better - iterate over the elements without explicitly stating

Getting non-contiguous text with lxml / ElementTree

阅读更多关于 Getting non-contiguous text with lxml / ElementTree

问题 Suppose I have this sort of HTML from which I need to select "text2" using lxml / ElementTree: <div>text1<span>childtext1</span>text2<span>childtext2</span>text3</div> If I already have the div element as mydiv, then mydiv.text returns just "text1". Using itertext() seems problematic or cumbersome at best since it walks the entire tree under the div. Is there any simple/elegant way to extract a non-first text chunk from an element? 回答1: Well, lxml.etree provides full XPath support, which

how to remove attribute of a etree Element?

阅读更多关于 how to remove attribute of a etree Element?

问题 I've Element of etree having some attributes - how can we delete the attribute of perticular etree Element. 回答1: The .attrib member of the element object contains the dict of attributes - you can use .pop("key") or del like you would on any other dict to remove a key-val pair. 回答2: Example : >>> from lxml import etree >>> from lxml.builder import E >>> otree = E.div() >>> otree.set("id","123") >>> otree.set("data","321") >>> etree.tostring(otree) '<div id="123" data="321"/>' >>> del otree

Find parent element of a 'title' xml tag containing specific text using Python ElementTree

阅读更多关于 Find parent element of a 'title' xml tag containing specific text using Python ElementTree

问题 I wish to parse an xml file and extract the parent <sec> which contains a <title> matching a specific text using Python 3.7 & ElementTree ... <sec id="s0010"> <label>2</label> <title>Materials and methods</title> </sec> <sec id="s0015"> <label>3</label> <title>Summary</title> </sec> ... I was able to locate the title using ET: for title in parent.iter('title'): text = title.text if(text): if("methods" in text.lower()): print("**title: "+text+"****") But how do I get the parent object ( <sec>

Merging Lots of XML files

阅读更多关于 Merging Lots of XML files

问题 I have lots of xml files that I need to merge. I have tried this link at merging xml files using python's ElementTree whose code is (Edited as per my need): import os, os.path, sys import glob from xml.etree import ElementTree def run(files): xml_files = glob.glob(files +"/*.xml") xml_element_tree = None for xml_file in xml_files: print xml_file data = ElementTree.parse(xml_file).getroot() # print ElementTree.tostring(data) for result in data.iter('TALLYMESSAGE'): if xml_element_tree is None:

lxml.etree insert elements into element.text

阅读更多关于 lxml.etree insert elements into element.text

问题 I have strings that have empty xml elements in them, like this: >>> s = """fizz buzz <pb n="44"/> bananas""" These strings have been assigned to xml elements using the etree.SubElement method: >>> from lxml import etree as et >>> root = et.Element('root') >>> txt = et.SubElement(root, 'text') >>> txt.text = s >>> et.dump(root) <root> <text>fizz buzz <pb n="44"/> bananas</text> </root> Fiddling about with re.split() and etree's text and tail I can insert a subelement <pb n="44"/> where I want

SyntaxError using gdata-python-client to access Google Book Search Data API

阅读更多关于 SyntaxError using gdata-python-client to access Google Book Search Data API

问题 >>> import gdata.books.service >>> service = gdata.books.service.BookService() >>> results = service.search_by_keyword(isbn='0434003484') Traceback (most recent call last): File "<pyshell#4>", line 1, in <module> results = service.search_by_keyword(isbn='0434003484') ... snip ... File "C:\Python26\lib\site-packages\atom\__init__.py", line 127, in CreateClassFromXMLString tree = ElementTree.fromstring(xml_string) File "<string>", line 85, in XML SyntaxError: syntax error: line 1, column 0 This