lxml

XML Declaration standalone=“yes” lxml

孤者浪人 提交于 2019-11-27 07:38:50
问题 I have an xml I am parsing, making some changes and saving out to a new file. It has the declaration <?xml version="1.0" encoding="utf-8" standalone="yes"?> which I would like to keep. When I am saving out my new file I am loosing the standalone="yes" bit. How can I keep it in? Here is my code: templateXml = """<?xml version="1.0" encoding="utf-8" standalone="yes"?> <package> <provider>Some Data</provider> <studio_display_name>Some Other Data</studio_display_name> </package>""" from lxml

Parse large XML with lxml

帅比萌擦擦* 提交于 2019-11-27 07:37:43
问题 I am trying to get my script working. So far it doesn't managed to output anything. This is my test.xml <mediawiki xmlns="http://www.mediawiki.org/xml/export-0.8/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.8/ http://www.mediawiki.org/xml/export-0.8.xsd" version="0.8" xml:lang="it"> <page> <title>MediaWiki:Category</title> <ns>0</ns> <id>2</id> <revision> <id>11248</id> <timestamp>2003-12-31T13:47:54Z</timestamp>

Parsing broken XML with lxml.etree.iterparse

时间秒杀一切 提交于 2019-11-27 07:28:45
I'm trying to parse a huge xml file with lxml in a memory efficient manner (ie streaming lazily from disk instead of loading the whole file in memory). Unfortunately, the file contains some bad ascii characters that break the default parser. The parser works if I set recover=True, but the iterparse method doesn't take the recover parameter or a custom parser object. Does anyone know how to use iterparse to parse broken xml? #this works, but loads the whole file into memory parser = lxml.etree.XMLParser(recover=True) #recovers from bad characters. tree = lxml.etree.parse(filename, parser) #how

Get all text from an XML document?

|▌冷眼眸甩不掉的悲伤 提交于 2019-11-27 07:13:01
问题 How can I get all the text content of an XML document, as a single string - like this Ruby/hpricot example but using Python. I'd like to replace XML tags with a single whitespace. 回答1: EDIT: This is an answer posted when I thought one-space indentation is normal, and as the comments mention it's not a good answer. Check out the others for some better solutions. This is left here solely for archival reasons, do not follow it! You asked for lxml: reslist = list(root.iter()) result = ' '.join(

Parse several XML declarations in a single file by means of lxml.etree.iterparse

亡梦爱人 提交于 2019-11-27 07:05:41
问题 I need to parse a file that contains various XML files, i.e., <xml></xml> <xml></xml> .. and so forth. While using etree.iterparse, I get the following (correct) error: lxml.etree.XMLSyntaxError: XML declaration allowed only at the start of the document Now, I can preprocess the input file and produce for each contained XML file a separate file. This might be the easiest solution. But I wonder if a proper solution for this 'problem' exists. Thanks! 回答1: The sample data you've provided

Saving XML using ETree in Python. It's not retaining namespaces, and adding ns0, ns1 and removing xmlns tags

拟墨画扇 提交于 2019-11-27 06:52:00
问题 I see there are similar questions here, but nothing that has totally helped me. I've also looked at the official documentation on namespaces but can't find anything that is really helping me, perhaps I'm just too new at XML formatting. I understand that perhaps I need to create my own namespace dictionary? Either way, here is my situation: I am getting a result from an API call, it gives me an XML that is stored as a string in my Python application. What I'm trying to accomplish is just grab

How do you install lxml on OS X Leopard without using MacPorts or Fink?

浪尽此生 提交于 2019-11-27 06:19:28
I've tried this and run in to problems a bunch of times in the past. Does anyone have a recipe for installing lxml on OS X without MacPorts or Fink that definitely works? Preferably with complete 1-2-3 steps for downloading and building each of the dependencies. Simon Willison Thanks to @jessenoller on Twitter I have an answer that fits my needs - you can compile lxml with static dependencies, hence avoiding messing with the libxml2 that ships with OS X. Here's what worked for me: cd /tmp curl -O http://lxml.de/files/lxml-3.6.0.tgz tar -xzvf lxml-3.6.0.tgz cd lxml-3.6.0 python setup.py build -

Use lxml to parse text file with bad header in Python

浪子不回头ぞ 提交于 2019-11-27 06:14:35
问题 I would like to parse text files (stored locally) with lxml's etree. But all of my files (thousands) have headers, such as: -----BEGIN PRIVACY-ENHANCED MESSAGE----- Proc-Type: 2001,MIC-CLEAR Originator-Name: webmaster@www.sec.gov Originator-Key-Asymmetric: MFgwCgYEVQgBAQICAf8DSgAwRwJAW2sNKK9AVtBzYZmr6aGjlWyK3XmZv3dTINen TWSM7vrzLADbmYQaionwg5sDW3P6oaM5D3tdezXMm7z1T+B+twIDAQAB MIC-Info: RSA-MD5,RSA, AHxm/u6lqdt8X6gebNqy9afC2kLXg+GVIOlG/Vrrw/dTCPGwM15+hT6AZMfDSvFZ YVPEaPjyiqB4rV/GS2lj6A== <SEC

lxml: DLL load failed: The specified module could not be found

安稳与你 提交于 2019-11-27 06:08:24
问题 I have Windows Server 2008 R2 x64. It has running Python27 x86 + Django 1.3 + apache 2.2 x86 under wsgi . It runs ok without lxml. We're using soaplib, which requires lxml. I tried installing lxml in several ways: using easy_install downloading win32 installation binary. Problem is that running under site under apache raises following error: ImportError at / DLL load failed: The specified module could not be found. It raises this error on from lxml import etree I have googled for solution but

python中用beautifulSoup+urlib2 安装、抓取和解析网页,以及解析shtml

坚强是说给别人听的谎言 提交于 2019-11-27 06:08:08
安装 Beautiful Soup ¶ 如果你用的是新版的Debain或ubuntu,那么可以通过系统的软件包管理来安装: $ apt-get install Python-bs4 Beautiful Soup 4 通过PyPi发布,所以如果你无法使用系统包管理安装,那么也可以通过 easy_install 或 pip 来安装.包的名字是 beautifulsoup4 ,这个包兼容Python2和Python3. $ easy_install beautifulsoup4 $ pip install beautifulsoup4 (在PyPi中还有一个名字是 BeautifulSoup 的包,但那可能不是你想要的,那是 Beautiful Soup3 的发布版本,因为很多项目还在使用BS3, 所以 BeautifulSoup 包依然有效.但是如果你在编写新项目,那么你应该安装的 beautifulsoup4 ) 如果你没有安装 easy_install 或 pip ,那你也可以 下载BS4的源码 ,然后通过setup.py来安装. $ Python setup.py install 如果上述安装方法都行不通,Beautiful Soup的发布协议允许你将BS4的代码打包在你的项目中,这样无须安装即可使用. 作者在Python2.7和Python3.2的版本下开发Beautiful