lxml

How do I validate against multiple xsd schemas using lxml?

本秂侑毒 提交于 2019-12-10 11:22:28
问题 I'm writing a unit test that validates sitemap xml I generate by fetching its xsd schema and validating using python's lxml library: Here's some metadata on my root element: xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd http://www.google.com/schemas/sitemap-image/1.1 http://www.google.com/schemas/sitemap

How can lxml validate some XML against both an XSD file while also loading an inline schema too?

南楼画角 提交于 2019-12-10 11:11:00
问题 I'm having problems getting lxml to successfully validate some xml. The XSD schema and XML file are both from Amazon documentation so should be compatible. But the XML itself refers to another schema that's not being loaded. Here is my code, which is based on the lxml validation tutorial: xsd_doc = etree.parse('ProductImage.xsd') xsd = etree.XMLSchema(xsd_doc) xml = etree.parse('ProductImage_sample.xml') xsd.validate(xml) print xsd.error_log "ProductImage_sample.xml:2:0:ERROR:SCHEMASV:SCHEMAV

Pycharm: how to set a custom string function (ie Type Renderer) for external object types?

爱⌒轻易说出口 提交于 2019-12-10 10:25:39
问题 Is it possible to configure PyCharm to use a custom function to display the __str__ representation of a type in a debug session? I am referring to built-in types or types imported from third party libraries which I would rather not modify. For example, instead of a string in the debugger like {lxml.html.HtmlElement} <Element tr at 0x10e2c1418> I would like to have the output of etree.tostring(element) . Intellij Idea has Java Type Renderers where you can set a custom toString() method for any

Python XpathEvaluator without namespace

雨燕双飞 提交于 2019-12-10 09:44:01
问题 I need to write a dynamic function that finds elements on a subtree of an ATOM xml document. To do so, I've written something like this: tree = etree.parse(xmlFileUrl) e = etree.XPathEvaluator(tree, namespaces={'def':'http://www.w3.org/2005/Atom'}) entries = e('//def:entry') for entry in entries: mypath = tree.getpath(entry) + "/category" category = e(mypath) The code above fails to find category. The reason is that getpath returns an XPath without namespaces, whereas the XPathEvaluator e()

python 安装 lxml

眉间皱痕 提交于 2019-12-10 09:15:25
python 在安装lxml的时候,总是安装失败,这个时候,我们可以换一个库进行安装,我这使用的是豆瓣 pip install -i https://pypi.douban.com/simple lxml 这样会超级快 或者是安装其他的插件的时候 比如 locustio ,也可以使用该库进行安装 pip install -i https://pypi.douban.com/simple locustio    来源: https://www.cnblogs.com/mafy/p/12014695.html

Iterate over both text and elements in lxml etree

随声附和 提交于 2019-12-10 03:09:02
问题 Suppose I have the following XML document: <species> Mammals: <dog/> <cat/> Reptiles: <snake/> <turtle/> Birds: <seagull/> <owl/> </species> Then I get the species element like this: import lxml.etree doc = lxml.etree.fromstring(xml) species = doc.xpath('/species')[0] Now I would like to print a list of animals grouped by species. How could I do it using ElementTree API? 回答1: If you enumerate all of the nodes, you'll see a text node with the class followed by element nodes with the species: >

python lxml etree applet information from yahoo

这一生的挚爱 提交于 2019-12-10 03:05:36
问题 Yahoo finance updated their website. I had an lxml/etree script that used to extract the analyst recommendations. Now, however, the analyst recommendations are there, but only as a graphic. You can see an example on this page. The graph called Recommendation Trends on the right hand column shows the number of analyst reports showing strong buy, buy, hold, underperform, and sell. My guess is that yahoo will make a few adjustments to the page over the coming little while, but it got me

Encoding error while parsing RSS with lxml

偶尔善良 提交于 2019-12-10 02:38:12
问题 I want to parse downloaded RSS with lxml, but I don't know how to handle with UnicodeDecodeError? request = urllib2.Request('http://wiadomosci.onet.pl/kraj/rss.xml') response = urllib2.urlopen(request) response = response.read() encd = chardet.detect(response)['encoding'] parser = etree.XMLParser(ns_clean=True,recover=True,encoding=encd) tree = etree.parse(response, parser) But I get an error: tree = etree.parse(response, parser) File "lxml.etree.pyx", line 2692, in lxml.etree.parse (src/lxml

Python XML Parsing Algorithm Speed

点点圈 提交于 2019-12-09 23:10:45
问题 I'm currently parsing a large XML file of the following form in a python-flask webapp on heroku: <book name="bookname"> <volume n="1" name="volume1name"> <chapter n="1"> <li n="1">li 1 content</li> <li n="2">li 2 content</li> </chapter/> <chapter n="2"> <li n="1">li 1 content</li> <li n="2">li 2 content</li> </chapter/> </volume> <volume n="2" name="volume2name"> <chapter n="1"> <li n="1">li 1 content</li> <li n="2">li 2 content</li> </chapter/> <chapter n="2"> <li n="1">li 1 content</li> <li

PYTHON 2.6 XML.ETREE to output single quote for attributes instead of double quote

夙愿已清 提交于 2019-12-09 21:49:08
问题 i got the following code : #!/usr/bin/python2.6 from lxml import etree n = etree.Element('test') n.set('id','1234') print etree.tostring(n) the output generate is <test id="1234"/> but i want <test id='1234'/> can someone help ? 回答1: I checked the documentation and found no reference for single/double-quote option. I think your only recourse is print etree.tostring(n).replace('"', "'") Update Given: from lxml import etree n = etree.Element('test') n.set('id', "Zach's not-so-good answer") my