lxml | 易学教程

Building lxml for Python 2.7 on Windows

阅读更多关于 Building lxml for Python 2.7 on Windows

问题 I am trying to build lxml for Python 2.7 on Windows 64 bit machine. I couldn\'t find lxml egg for Python 2.7 version. So I am compiling it from sources. I am following instructions on this site http://lxml.de/build.html under static linking section. I am getting error C:\\Documents and Settings\\Administrator\\Desktop\\lxmlpackage\\lxml-2.2.6\\lxml-2.2. 6>python setup.py bdist_wininst --static Building lxml version 2.2.6. NOTE: Trying to build without Cython, pre-generated \'src/lxml/lxml

How to get path of an element in lxml?

阅读更多关于 How to get path of an element in lxml?

问题 I\'m searching in a HTML document using XPath from lxml in python. How can I get the path to a certain element? Here\'s the example from ruby nokogiri: page.xpath(\'//text()\').each do |textnode| path = textnode.path puts path end print for example \' /html/body/div/div[1]/div[1]/p/text()[1] \' and this is the string I want to get in python. 回答1: Use getpath from ElementTree objects. from lxml import etree root = etree.fromstring('<foo><bar>Data</bar><bar><baz>data</baz>' '<baz>data</baz><

libxml install error using pip

阅读更多关于 libxml install error using pip

This is my error: (mysite)zjm1126@zjm1126-G41MT-S2:~/zjm_test/mysite$ pip install lxml Downloading/unpacking lxml Running setup.py egg_info for package lxml Building lxml version 2.3. Building without Cython. ERROR: /bin/sh: xslt-config: not found ** make sure the development packages of libxml2 and libxslt are installed ** Using build configuration of libxslt Installing collected packages: lxml Running setup.py install for lxml Building lxml version 2.3. Building without Cython. ERROR: /bin/sh: xslt-config: not found ** make sure the development packages of libxml2 and libxslt are installed *

How do you install lxml on OS X Leopard without using MacPorts or Fink?

阅读更多关于 How do you install lxml on OS X Leopard without using MacPorts or Fink?

问题 I\'ve tried this and run in to problems a bunch of times in the past. Does anyone have a recipe for installing lxml on OS X without MacPorts or Fink that definitely works? Preferably with complete 1-2-3 steps for downloading and building each of the dependencies. 回答1: Thanks to @jessenoller on Twitter I have an answer that fits my needs - you can compile lxml with static dependencies, hence avoiding messing with the libxml2 that ships with OS X. Here's what worked for me: cd /tmp curl -O http

Cannot install Lxml on Mac os x 10.9

阅读更多关于 Cannot install Lxml on Mac os x 10.9

I want to install Lxml so I can then install Scrapy. When I updated my Mac today it wouldn't let me reinstall lxml, I get the following error: In file included from src/lxml/lxml.etree.c:314: /private/tmp/pip_build_root/lxml/src/lxml/includes/etree_defs.h:9:10: fatal error: 'libxml/xmlversion.h' file not found #include "libxml/xmlversion.h" ^ 1 error generated. error: command 'cc' failed with exit status 1 I have tried using brew to install libxml2 and libxslt, both installed fine but I still cannot install lxml. Last time I was installing I needed to enable the developer tools on Xcode but

builtins.TypeError: must be str, not bytes

阅读更多关于 builtins.TypeError: must be str, not bytes

问题 I\'ve converted my scripts from Python 2.7 to 3.2, and I have a bug. # -*- coding: utf-8 -*- import time from datetime import date from lxml import etree from collections import OrderedDict # Create the root element page = etree.Element(\'results\') # Make a new document tree doc = etree.ElementTree(page) # Add the subelements pageElement = etree.SubElement(page, \'Country\',Tim = \'Now\', name=\'Germany\', AnotherParameter = \'Bye\', Code=\'DE\', Storage=\'Basic\') pageElement = etree

parsing xml containing default namespace to get an element value using lxml

阅读更多关于 parsing xml containing default namespace to get an element value using lxml

问题 I have a xml string like this str1 = \"\"\"<sitemapindex xmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\"> <sitemap> <loc> http://www.example.org/sitemap_1.xml.gz </loc> <lastmod>2015-07-01</lastmod> </sitemap> </sitemapindex> \"\"\" I want to extract all the urls present inside <loc> node i.e http://www.example.org/sitemap_1.xml.gz I tried this code but it didn\'t word from lxml import etree root = etree.fromstring(str1) urls = root.xpath(\"//loc/text()\") print urls [] I tried to check

SyntaxError of Non-ASCII character [duplicate]

阅读更多关于 SyntaxError of Non-ASCII character [duplicate]

问题 This question already has answers here : Correct way to define Python source code encoding (6 answers) SyntaxError: Non-ASCII character '\\xa3' in file when function returns '£' (5 answers) Closed 3 years ago . I am trying to parse xml which contains the some non ASCII cheracter, the code looks like below from lxml import etree from lxml import objectify content = u\'<?xml version=\"1.0\" encoding=\"utf-8\"?><div>Order date : 05/08/2013 12:24:28</div>\' mail.replace(\'\\xa0\',\' \') xml =

using lxml and iterparse() to parse a big (+- 1Gb) XML file

阅读更多关于 using lxml and iterparse() to parse a big (+- 1Gb) XML file

问题 I have to parse a 1Gb XML file with a structure such as below and extract the text within the tags \"Author\" and \"Content\": <Database> <BlogPost> <Date>MM/DD/YY</Date> <Author>Last Name, Name</Author> <Content>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas dictum dictum vehicula.</Content> </BlogPost> <BlogPost> <Date>MM/DD/YY</Date> <Author>Last Name, Name</Author> <Content>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas dictum dictum vehicula.<

Remove namespace and prefix from xml in python using lxml

阅读更多关于 Remove namespace and prefix from xml in python using lxml

问题 I have an xml file I need to open and make some changes to, one of those changes is to remove the namespace and prefix and then save to another file. Here is the xml: <?xml version=\'1.0\' encoding=\'UTF-8\'?> <package xmlns=\"http://apple.com/itunes/importer\"> <provider>some data</provider> <language>en-GB</language> </package> I can make the other changes I need, but can\'t find out how to remove the namespace and prefix. This is the reusklt xml I need: <?xml version=\'1.0\' encoding=\'UTF