lxml

Strip out all namespace declarations, tags and attributes from SVG file with Python/lxml

不羁岁月 提交于 2019-12-10 22:59:08
问题 I have this script for cleaning up SVG files with Python and lxml. It removes invisible elements and tries to solve a few selected namespace issues: from lxml import etree path = '/image.svg' svg_xml = open(path, 'r').read() # resolve problematic namespace issues # remove specific and undefined Illustrator tags if '<i:pgf></i:pgf>' in svg_xml: svg_xml = svg_xml.replace('<i:pgf></i:pgf>', '') # make sure the xmlns:xlink URL is correct if 'xmlns:xlink' in svg_xml: parts = svg_xml.split('xmlns

Why does this xpath expression return an empty list?

孤街醉人 提交于 2019-12-10 22:22:21
问题 I'm trying to parse this XML. It's a YouTube feed. I'm working based on code in the tutorial. I want to get all the entry nodes that are nested under the feed . from lxml import etree root = etree.fromstring(text) entries = root.xpath("/feed/entry") print entries For some reason entries is an empty list. Why? 回答1: feed and all its children are actually in the http://www.w3.org/2005/Atom namespace. You need to tell your xpath that: entries = root.xpath("/atom:feed/atom:entry", namespaces={

python lxml with py2exe

旧城冷巷雨未停 提交于 2019-12-10 22:06:08
问题 I have Generated an XML with dom and i want to use lxml to pretty print the xml. this is my code for pretty print the xml def prettify_xml(xml_str): import lxml.etree as etree root = etree.fromstring(xml_str) xml_str = etree.tostring(root, pretty_print=True) return xml_str my output should be an xml formatted string. I got this code from some post in stactoverflow. This works flawlessly when i am compiling wit python itself. But when i convert my project to a binary created from py2exe (my

Just returning the text of elements in xpath (python / lxml)

百般思念 提交于 2019-12-10 21:31:48
问题 I have an XML structure like this: mytree = """ <path> <to> <nodes> <info>1</info> <info>2</info> <info>3</info> </nodes> </to> </path> """ I'm currently using xpath in python lxml to grab the nodes: >>> from lxml import etree >>> info = etree.XML(mytree) >>> print info.xpath("/path/to/nodes/info") [<Element info at 0x15af620>, <Element info at 0x15af940>, <Element info at 0x15af850>] >>> for x in info.xpath("/path/to/nodes/info"): print x.text 1 2 3 This is great, but is there a cleaner way

changing element namespace in lxml

浪子不回头ぞ 提交于 2019-12-10 19:48:59
问题 With lxml , I am not sure how to properly remove the namespace of an existing element and set a new one. For instance, I'm parsing this minimal xml file: <myroot xmlns="http://myxml.com/somevalue"> <child1>blabla</child1> <child2>blablabla</child2> </myroot> ... and I'd like it to become: <myroot xmlns="http://myxml.com/newvalue"> <child1>blabla/child1> <child2>blablabla</child2> </myroot> With lxml : from lxml import etree as ET tree = ET.parse('myfile.xml') root= tree.getroot() If I inspect

Get inner text from lxml

ぐ巨炮叔叔 提交于 2019-12-10 18:27:53
问题 lxml.html.fromstring insists on wrapping up everything in a tag ( p default). From this tag tree, <p>this is <b>the</b> good stuff<p> I want to extract the string: this is <b>the</b> good stuff How do I do this? 回答1: That's often referred to as "inner xml" rather than "inner text". This is one possible way to get inner xml of an element : import lxml.etree as etree import lxml.html html = "<p>this is <b>the</b> good stuff<p>" tree = lxml.html.fromstring(html) node = tree.xpath("//p")[0]

missing some text when iterating xml elements in python

こ雲淡風輕ζ 提交于 2019-12-10 18:27:40
问题 I am running the following code in Python 2.7.3 on Mac OS X 10.6.8. import StringIO from lxml import etree f = open('./foo', 'r') doc = "" while 1: line = f.readline() doc += line if line == "": break tree = etree.parse(StringIO.StringIO(doc), etree.HTMLParser()) r = tree.xpath('//foo') for i in r: for j in i.iter(): print j.tag, j.text And the file foo contains <foo> AAA <bar> BBB </bar> XXX </foo> The output is foo AAA bar BBB Why am I not getting the text XXX ? How do I access it? Thanks

undefined symbol: PyFPE_jbuf error while using 'lxml' on ubuntu

狂风中的少年 提交于 2019-12-10 18:24:43
问题 I am trying to import 'lxml' library into my python program as follows. from lxml import etree However, I am getting an error as 'undefined symbol: PyFPE_jbuf'. Here is the entire stack trace File "xmlExtract.py", line 4, in <module> from lxml import etree ImportError: /usr/local/lib/python3.4/dist-packages/lxml/etree.cpython-34m.so: undefined symbol: PyFPE_jbuf I have carefully installed 'lxml' library including all of its dependencies (libxml2-dev, libxslt-dev, python-dev). I also have

Decoding problems in Django and lxml

偶尔善良 提交于 2019-12-10 18:02:10
问题 I have a strange problem with lxml when using the deployed version of my Django application. I use lxml to parse another HTML page which I fetch from my server. This works perfectly well on my development server on my own computer, but for some reason it gives me UnicodeDecodeError on the server. ('utf8', "\x85why hello there!", 0, 1, 'unexpected code byte') I have made sure that Apache (with mod_python) runs with LANG='en_US.UTF-8' . I've tried googling for this problem and tried different

Python3, lxml and “Symbol not found: _lzma_auto_decoder” on Mac OS X 10.9

自闭症网瘾萝莉.ら 提交于 2019-12-10 17:34:28
问题 I have installed python 3 using homebrew and afterwards installed pip3 and lxml. The following line from lxml import entree leads to the following error: $ python3 Python 3.3.5 (v3.3.5:62cf4e77f785, Mar 9 2014, 01:12:57) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from lxml import etree Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: dlopen(/Library/Frameworks/Python