lxml | 易学教程

How can I set up lxml and pypy on Yosemite?

阅读更多关于 How can I set up lxml and pypy on Yosemite?

问题 I wanted to do some learning with lxml and pypy, so I decided to get it set up on my Yosemite Mac. But after three days of trying, I still haven't been able to try lxml, because I can't get my setup right. Here's what I've done: Did a clean homebrew and xcode-select --install install proix:~ user$ brew --version 0.9.5 proix:~ user$ gcc --version Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1 Apple LLVM version 6.0 (clang

XML pretty print fails in Python lxml

阅读更多关于 XML pretty print fails in Python lxml

问题 I am trying to read, modify, and write an XML file with lxml 4.1.1 in Python 2.7.6. My code: import lxml.etree as et fn_xml_in = 'in.xml' parser = et.XMLParser(remove_blank_text=True) xml_doc = et.parse(fn_xml_in, parser) xml_doc.getroot().find('b').append(et.Element('c')) xml_doc.write('out.xml', method='html', pretty_print=True) The input file in.xml looks like this: <a> <b/> </a> And the produced output file out.xml : <a> <b><c></c></b> </a> Or when I set remove_blank_text=True : <a><b><c>

lxml.etree insert elements into element.text

阅读更多关于 lxml.etree insert elements into element.text

问题 I have strings that have empty xml elements in them, like this: >>> s = """fizz buzz <pb n="44"/> bananas""" These strings have been assigned to xml elements using the etree.SubElement method: >>> from lxml import etree as et >>> root = et.Element('root') >>> txt = et.SubElement(root, 'text') >>> txt.text = s >>> et.dump(root) <root> <text>fizz buzz <pb n="44"/> bananas</text> </root> Fiddling about with re.split() and etree's text and tail I can insert a subelement <pb n="44"/> where I want

lxml unicode entity parse problems

阅读更多关于 lxml unicode entity parse problems

问题 I'm using lxml as follows to parse an exported XML file from another system: xmldoc = open(filename) etree.parse(xmldoc) But im getting: lxml.etree.XMLSyntaxError: Entity 'eacute' not defined, line 4495, column 46 Obviously it's having problems with unicode entity names - but how would i get round this? Via open() or parse()? Edit: I had forgotten to include my DTD in the same folder - it's there now and has the following declaration: <!ENTITY eacute "é"> and is referred to (and always was)

lxml error “IOError: Error reading file” when parsing facebook mobile in a python scraper script

阅读更多关于 lxml error “IOError: Error reading file” when parsing facebook mobile in a python scraper script

问题 I use a modified script from Logging into facebook with python post : #!/usr/bin/python2 -u # -*- coding: utf8 -*- facebook_email = "YOUR_MAIL@DOMAIN.TLD" facebook_passwd = "YOUR_PASSWORD" import cookielib, urllib2, urllib, time, sys from lxml import etree jar = cookielib.CookieJar() cookie = urllib2.HTTPCookieProcessor(jar) opener = urllib2.build_opener(cookie) headers = { "User-Agent" : "Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_0 like Mac OS X; en-us) AppleWebKit/532.9 (KHTML, like Gecko)

python lxml findall with multiple namespaces

阅读更多关于 python lxml findall with multiple namespaces

问题 I'm trying to parse an XML document with multiple namespaces with lxml, and I'm stuck on getting the findall() method to return something. My XML: <MeasurementRecords xmlns="http://www.company.com/common/rsp/2012/07" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.company.com/common/rsp/2012/07 RSP_EWS_V1.6.xsd"> <HistoryRecords> <ValueItemId>100_0000100004_3788_Resource-0.customId_WSx Data Precip Type</ValueItemId> <List> <HistoryRecord> <Value>60</Value>

lxml.etree._Element.append() from a loop not working as expected

阅读更多关于 lxml.etree._Element.append() from a loop not working as expected

问题 I would like to know why in this code append() seems to work from inside the loop, but the resulting xml displays the modification from only the last iteration, while remove() works as expected. This is a overly simplified example, I'm working with big chunks of data, and need to append the same subtree to many different parents. from lxml import etree xml = etree.fromstring('<tree><fruit id="1"></fruit><fruit id="2"></fruit></tree>') sub = etree.fromstring('<apple/>') for i, item in

Scrapy: Unable to create a project

阅读更多关于 Scrapy: Unable to create a project

问题 I had issues installing scrapy respect to lxml but then I found some information on stackoverflow. Based on that information I did a sudo easy_install lxml with some error I think scrapy got install: Reason I came to that judgement is that I repel I could do following: Python 2.7.5 (default, Jul 28 2013, 07:27:04) [GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from scrapy import * >>> But when I try

Scrapy: Unable to create a project

阅读更多关于 Scrapy: Unable to create a project

from scrapy.selector import selector error

阅读更多关于 from scrapy.selector import selector error

问题 I am unable to do the following: from scrapy.selector import Selector The error is: File "/Desktop/KSL/KSL/spiders/spider.py", line 1, in from scrapy.selector import Selector ImportError: cannot import name Selector It is as if LXML is not installed on my machine, but it is. Also, I thought this was a default module built into scrapy. Maybe not? Thoughts? 回答1: Try importing HtmlXPathSelector instead. from scrapy.selector import HtmlXPathSelector And then use the .select() method to parse out