lxml

Python Lxml (objectify): Checking whether a tag exists

人盡茶涼 提交于 2019-12-03 07:54:19
问题 I need to check whether a certain tag exists in an xml file. For example, I want to see if the tag exists in this snippet: <main> <elem1/> <elem2>Hi</elem2> <elem3/> ... </main> Currently, I am using an ugly hack with error checking, like this: try: if root.elem1.tag: foo = elem1 except AttributeError: foo = "error finding elem1" I also want to customize the string if it is unable to find the node (i.e. "unable to find -tagname-"). I have to check a long list of variables, and I don't want to

Problem using py2app with the lxml package

丶灬走出姿态 提交于 2019-12-03 07:51:48
I am trying to use 'py2app' to generate a standalone application from some Python scripts. The Python uses the 'lxml' package, and I've found that I have to specify this explicitly in the setup.py file that 'py2app' uses. However, the resulting application program still won't run on machines that haven't had 'lxml' installed. My Setup.py looks like this: from setuptools import setup OPTIONS = {'argv_emulation': True, 'packages' : ['lxml']} setup(app=[MyApp.py], data_files=[], options={'py2app' : OPTIONS}, setup_requires=['py2app']) Running the application produces the following output: MyApp

lxml.etree, element.text doesn't return the entire text from an element

放肆的年华 提交于 2019-12-03 05:25:08
I scrapped some html via xpath, that I then converted into an etree. Something similar to this: <td> text1 <a> link </a> text2 </td> but when I call element.text, I only get text1 (It must be there, when I check my query in FireBug, the text of the elements is highlighted, both the text before and after the embedded anchor elements... Use element.xpath("string()") or lxml.etree.tostring(element, method="text") - see the documentation . As a public service to people out there who may be as lazy as I am. Here's some code from above that you can run. from lxml import etree def get_text1(node):

lxml not adding newlines when inserting a new element into existing xml

你说的曾经没有我的故事 提交于 2019-12-03 05:24:49
I have a large set of existing xml files, and I am trying to add one element to all of them (they are pom.xml for a number of maven projects, and I am trying to add a parent element to all of them). The following is my exact code. The problem is that the final xml output in pom2.xml has the complete parent element in a single line. Though, when I print the element by itself, it writes it out in 4 lines as usual. How do I print out the complete xml with proper formatting for the parent element? from lxml import etree parentPom = etree.Element('parent') groupId = etree.Element('groupId') groupId

Installing easy_install… to get to installing lxml

拈花ヽ惹草 提交于 2019-12-03 04:54:17
问题 I've come to grips with the fact that ElementTree isn't going to do what I want it to do. I've checked out the documentation for lxml, and it appears that it will serve my purposes. To get lxml, I need to get easy_install. So I downloaded it from here, and put it in /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/ . Then I went to that folder, and ran sh setuptools-0.6c11-py2.6.egg . That installed successfully. Then I got excited because I thought the whole

Flask example with POST

只愿长相守 提交于 2019-12-03 04:04:57
问题 Suppose the following route which accesses an xml file to replace the text of a specific tag with a given xpath (?key=): @app.route('/resource', methods = ['POST']) def update_text(): # CODE Then, I would use cURL like this: curl -X POST http://ip:5000/resource?key=listOfUsers/user1 -d "John" The xpath expreesion listOfUsers/user1 should access the tag <user1> to change its current text to "John". I have no idea on how to achieve this because I'm just starting to learn Flask and REST and I

Is there a switch to ignore undefined namespace prefixes in LXML?

与世无争的帅哥 提交于 2019-12-03 03:54:03
I'm parsing a non-compliant XML file ( Sphinx's xmlpipe2 format) and would like LXML parser to ignore the fact that there are unresolved namespace prefixes. An example of the Sphinx XML: <sphinx:schema> <sphinx:field name="subject"/> <sphinx:field name="content"/> <sphinx:attr name="published" type="timestamp"/> <sphinx:attr name="author_id" type="int" bits="16" default="1"/> </sphinx:schema> I'm aware of passing a parser keyword option to try and recover broken XML, e.g. parser = etree.XMLParser(recover=True) tree = etree.parse('sphinxTest.xml', parser) but the above does not ignore the

python, lxml and xpath - html table parsing

左心房为你撑大大i 提交于 2019-12-03 03:40:13
I 'am new to lxml, quite new to python and could not find a solution to the following: I need to import a few tables with 3 columns and an undefined number of rows starting at row 3. When the second column of any row is empty, this row is discarded and the processing of the table is aborted. The following code prints the table's data fine (but I'm unable to reuse the data afterwards): from lxml.html import parse def process_row(row): for cell in row.xpath('./td'): print cell.text_content() yield cell.text_content() def process_table(table): return [process_row(row) for row in table.xpath('./tr

Parsing CDATA in xml with python

浪尽此生 提交于 2019-12-03 03:38:13
I need to parse an XML file with a number of blocks of CDATA that I need to retain for later plotting: <process id="process1"> <log name="name1" device="device1"><![CDATA[timestamp value]]]></log> <log name="name2" device="device2"><![CDATA[timestamp value, timestamp value, timestamp]]]></log> </process> I will need to do this repeatedly and quickly, and I am looking for the best way to do this. I've read that ElementTree is the faster of the methods, but I am open to other suggestions. Here are two examples of how to do it: from lxml import etree import xml.etree.ElementTree as ElementTree

selecting attribute values from lxml

六月ゝ 毕业季﹏ 提交于 2019-12-03 03:29:28
问题 I want to use an xpath expression to get the value of an attribute. I expected the following to work from lxml import etree for customer in etree.parse('file.xml').getroot().findall('BOB'): print customer.find('./@NAME') but this gives an error : Traceback (most recent call last): File "bob.py", line 22, in <module> print customer.find('./@ID') File "lxml.etree.pyx", line 1409, in lxml.etree._Element.find (src/lxml/lxml.etree.c:39972) File "/usr/local/lib/python2.7/dist-packages/lxml/