lxml

Decode base64 string in python 3 (with lxml or not)

断了今生、忘了曾经 提交于 2019-12-01 15:40:52
I know this looks embarrassingly easy, and I guess the problem is that I just don't have a clear understanding of all this bytes-str-unicode (and encoding-decoding , speaking frankly) stuff yet. I've been trying to get my working code to run on Python 3. The part I'm stuck with is when I parse an XML with lxml and decode a base64 string that is in that XML. The code now works in the following manner: I retrieve the binary data with an XPath query '.../binary/text()' . This produces a one-element list containing a lxml.etree._ElementUnicodeResult object. Then, with python 2, I was able to do:

Changing the default indentation of etree.tostring in lxml

与世无争的帅哥 提交于 2019-12-01 15:27:56
I have an XML document which I'm pretty-printing using lxml.etree.tostring print etree.tostring(doc, pretty_print=True) The default level of indentation is 2 spaces, and I'd like to change this to 4 spaces. There isn't any argument for this in the tostring function; is there a way to do this easily with lxml? As said in this thread , there is no real way to change the indent of the lxml.etree.tostring pretty-print. But, you can: add a XSLT transform to change the indent add whitespace to the tree, with something like in the cElementTree library code: def indent(elem, level=0): i = "\n" + level

Using python lxml.etree for huge XML files

岁酱吖の 提交于 2019-12-01 15:14:51
I would like to parse a huge xml (>200MB) using lxml.etree in Python. I tried to use etree.parse to load the XML file, but this does not work due to the filesize: etree.parse('file.xml')Traceback (most recent call last): File "<stdin>", line 1, in <module> File "lxml.etree.pyx", line 2706, in lxml.etree.parse (src/lxml/lxml.etree.c:49958) File "parser.pxi", line 1500, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:71797) File "parser.pxi", line 1529, in lxml.etree._parseDocumentFromURL (src/lxml/lxml.etree.c:72080) File "parser.pxi", line 1429, in lxml.etree._parseDocFromFile (src/lxml

Changing the default indentation of etree.tostring in lxml

核能气质少年 提交于 2019-12-01 14:18:44
问题 I have an XML document which I'm pretty-printing using lxml.etree.tostring print etree.tostring(doc, pretty_print=True) The default level of indentation is 2 spaces, and I'd like to change this to 4 spaces. There isn't any argument for this in the tostring function; is there a way to do this easily with lxml? 回答1: As said in this thread, there is no real way to change the indent of the lxml.etree.tostring pretty-print. But, you can: add a XSLT transform to change the indent add whitespace to

Using python lxml.etree for huge XML files

你。 提交于 2019-12-01 14:07:10
问题 I would like to parse a huge xml (>200MB) using lxml.etree in Python. I tried to use etree.parse to load the XML file, but this does not work due to the filesize: etree.parse('file.xml')Traceback (most recent call last): File "<stdin>", line 1, in <module> File "lxml.etree.pyx", line 2706, in lxml.etree.parse (src/lxml/lxml.etree.c:49958) File "parser.pxi", line 1500, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:71797) File "parser.pxi", line 1529, in lxml.etree._parseDocumentFromURL

Getting more granular diffs from difflib (or a way to post-process a diff to achieve the same thing)

。_饼干妹妹 提交于 2019-12-01 13:41:26
Downloading this page and making a minor edit to it, changing the first 65 in this paragraph to 68 : I then parse both sources with BeauifulSoup and diff them with difflib . url = 'https://secure.ssa.gov/apps10/reference.nsf/links/02092016062645AM' response = urllib2.urlopen(url) content = response.read() # get response as list of lines url2 = 'file:///Users/Pyderman/projects/temp/02092016062645AM-modified.html' response2 = urllib2.urlopen(url2) content2 = response2.read() # get response as list of lines import difflib d = difflib.Differ() diffed = d.compare(content, content) soup = bs4

How to make XPath return 'None' in Python if no data found?

时光总嘲笑我的痴心妄想 提交于 2019-12-01 12:24:46
问题 XPath returns nothing if a child element has no text value. In this case, rating has no data, so I want it to say so - None or nothing in this child instead of just ignoring it. Your input is much appreciated. XML : <?xml version="1.0" encoding="ISO-8859-1"?> <bookstore> <book> <title lang="eng">Harry Potter</title> <price>29.99</price> <rating></rating> </book> <book> <title lang="hindi">Learning XML</title> <price>39.95</price> <rating></rating> </book> </bookstore> Python : >>> import lxml

DATEXII XML file to DataFrame in Python

不羁岁月 提交于 2019-12-01 12:11:45
问题 The last couple of days I have been trying to open and read a certain XML file (in DATEXII format), but have not succeeded so far. It is about traffic data from the NDW Open Data website (Dutch Databank for Road and Traffic Data), hyperlink for the source of the XML files. The head of the tree is like in this picture and continues like this, see also snippet below. Though these together only form a very small part of the data. <?xml version="1.0"?> - <soapenv:Envelope xmlns:_0="http://datex2

入门指引

*爱你&永不变心* 提交于 2019-12-01 10:01:28
1. 申请公众号。教程有。 2. 开发配置(python) 安装 安装web.py 安装libxml2, libxslt, lxml python 开发自己的本地服务器,然后部署到公网 (可以使用小米球来映射本地服务器) 开发流程图      代码片段: 来源: https://www.cnblogs.com/ahMay/p/11677741.html

Parsing Large XML file with Python lxml and Iterparse

倖福魔咒の 提交于 2019-12-01 09:54:44
I'm attempting to write a parser using lxml and the iterparse method to step through a very large xml file containing many items. My file is of the format: <item> <title>Item 1</title> <desc>Description 1</desc> <url> <item>http://www.url1.com</item> </url> </item> <item> <title>Item 2</title> <desc>Description 2</desc> <url> <item>http://www.url2.com</item> </url> </item> and so far my solution is: from lxml import etree context = etree.iterparse( MYFILE, tag='item' ) for event, elem in context : print elem.xpath( 'description/text( )' ) elem.clear( ) while elem.getprevious( ) is not None :