lxml | 易学教程

Strip out all namespace declarations, tags and attributes from SVG file with Python/lxml

阅读更多关于 Strip out all namespace declarations, tags and attributes from SVG file with Python/lxml

问题 I have this script for cleaning up SVG files with Python and lxml. It removes invisible elements and tries to solve a few selected namespace issues: from lxml import etree path = '/image.svg' svg_xml = open(path, 'r').read() # resolve problematic namespace issues # remove specific and undefined Illustrator tags if '<i:pgf></i:pgf>' in svg_xml: svg_xml = svg_xml.replace('<i:pgf></i:pgf>', '') # make sure the xmlns:xlink URL is correct if 'xmlns:xlink' in svg_xml: parts = svg_xml.split('xmlns

Why does this xpath expression return an empty list?

阅读更多关于 Why does this xpath expression return an empty list?

问题 I'm trying to parse this XML. It's a YouTube feed. I'm working based on code in the tutorial. I want to get all the entry nodes that are nested under the feed . from lxml import etree root = etree.fromstring(text) entries = root.xpath("/feed/entry") print entries For some reason entries is an empty list. Why? 回答1: feed and all its children are actually in the http://www.w3.org/2005/Atom namespace. You need to tell your xpath that: entries = root.xpath("/atom:feed/atom:entry", namespaces={

python lxml with py2exe

阅读更多关于 python lxml with py2exe

问题 I have Generated an XML with dom and i want to use lxml to pretty print the xml. this is my code for pretty print the xml def prettify_xml(xml_str): import lxml.etree as etree root = etree.fromstring(xml_str) xml_str = etree.tostring(root, pretty_print=True) return xml_str my output should be an xml formatted string. I got this code from some post in stactoverflow. This works flawlessly when i am compiling wit python itself. But when i convert my project to a binary created from py2exe (my

Just returning the text of elements in xpath (python / lxml)

阅读更多关于 Just returning the text of elements in xpath (python / lxml)

问题 I have an XML structure like this: mytree = """ <path> <to> <nodes> <info>1</info> <info>2</info> <info>3</info> </nodes> </to> </path> """ I'm currently using xpath in python lxml to grab the nodes: >>> from lxml import etree >>> info = etree.XML(mytree) >>> print info.xpath("/path/to/nodes/info") [<Element info at 0x15af620>, <Element info at 0x15af940>, <Element info at 0x15af850>] >>> for x in info.xpath("/path/to/nodes/info"): print x.text 1 2 3 This is great, but is there a cleaner way

changing element namespace in lxml

阅读更多关于 changing element namespace in lxml

问题 With lxml , I am not sure how to properly remove the namespace of an existing element and set a new one. For instance, I'm parsing this minimal xml file: <myroot xmlns="http://myxml.com/somevalue"> <child1>blabla</child1> <child2>blablabla</child2> </myroot> ... and I'd like it to become: <myroot xmlns="http://myxml.com/newvalue"> <child1>blabla/child1> <child2>blablabla</child2> </myroot> With lxml : from lxml import etree as ET tree = ET.parse('myfile.xml') root= tree.getroot() If I inspect

Get inner text from lxml

阅读更多关于 Get inner text from lxml

问题 lxml.html.fromstring insists on wrapping up everything in a tag ( p default). From this tag tree, this is the good stuff I want to extract the string: this is the good stuff How do I do this? 回答1: That's often referred to as "inner xml" rather than "inner text". This is one possible way to get inner xml of an element : import lxml.etree as etree import lxml.html html = "this is the good stuff" tree = lxml.html.fromstring(html) node = tree.xpath("//p")[0]

missing some text when iterating xml elements in python

阅读更多关于 missing some text when iterating xml elements in python

问题 I am running the following code in Python 2.7.3 on Mac OS X 10.6.8. import StringIO from lxml import etree f = open('./foo', 'r') doc = "" while 1: line = f.readline() doc += line if line == "": break tree = etree.parse(StringIO.StringIO(doc), etree.HTMLParser()) r = tree.xpath('//foo') for i in r: for j in i.iter(): print j.tag, j.text And the file foo contains <foo> AAA <bar> BBB </bar> XXX </foo> The output is foo AAA bar BBB Why am I not getting the text XXX ? How do I access it? Thanks

undefined symbol: PyFPE_jbuf error while using 'lxml' on ubuntu

阅读更多关于 undefined symbol: PyFPE_jbuf error while using 'lxml' on ubuntu

问题 I am trying to import 'lxml' library into my python program as follows. from lxml import etree However, I am getting an error as 'undefined symbol: PyFPE_jbuf'. Here is the entire stack trace File "xmlExtract.py", line 4, in <module> from lxml import etree ImportError: /usr/local/lib/python3.4/dist-packages/lxml/etree.cpython-34m.so: undefined symbol: PyFPE_jbuf I have carefully installed 'lxml' library including all of its dependencies (libxml2-dev, libxslt-dev, python-dev). I also have

Decoding problems in Django and lxml

阅读更多关于 Decoding problems in Django and lxml

问题 I have a strange problem with lxml when using the deployed version of my Django application. I use lxml to parse another HTML page which I fetch from my server. This works perfectly well on my development server on my own computer, but for some reason it gives me UnicodeDecodeError on the server. ('utf8', "\x85why hello there!", 0, 1, 'unexpected code byte') I have made sure that Apache (with mod_python) runs with LANG='en_US.UTF-8' . I've tried googling for this problem and tried different

Python3, lxml and “Symbol not found: _lzma_auto_decoder” on Mac OS X 10.9

阅读更多关于 Python3, lxml and “Symbol not found: _lzma_auto_decoder” on Mac OS X 10.9

问题 I have installed python 3 using homebrew and afterwards installed pip3 and lxml. The following line from lxml import entree leads to the following error: $ python3 Python 3.3.5 (v3.3.5:62cf4e77f785, Mar 9 2014, 01:12:57) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from lxml import etree Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: dlopen(/Library/Frameworks/Python