lxml

Python: Using xpath locally / on a specific element

情到浓时终转凉″ 提交于 2019-11-26 18:05:47
问题 I'm trying to get the links from a page with xpath. The problem is that I only want the links inside a table, but if I apply the xpath expression on the whole page I'll capture links which I don't want. For example: tree = lxml.html.parse(some_response) links = tree.xpath("//a[contains(@href, 'http://www.example.com/filter/')]") The problem is that applies the expression to the whole document. I located the element I want, for example: tree = lxml.html.parse(some_response) root = tree.getroot

python - lxml: enforcing a specific order for attributes

纵然是瞬间 提交于 2019-11-26 17:49:01
问题 I have an XML writing script that outputs XML for a specific 3rd party tool. I've used the original XML as a template to make sure that I'm building all the correct elements, but the final XML does not appear like the original. I write the attributes in the same order, but lxml is writing them in its own order. I'm not sure, but I suspect that the 3rd part tool expects attributes to appear in a specific order, and I'd like to resolve this issue so I can see if its the attrib order that making

Get all text inside a tag in lxml

感情迁移 提交于 2019-11-26 17:22:25
I'd like to write a code snippet that would grab all of the text inside the <content> tag, in lxml, in all three instances below, including the code tags. I've tried tostring(getchildren()) but that would miss the text in between the tags. I didn't have very much luck searching the API for a relevant function. Could you help me out? <!--1--> <content> <div>Text inside tag</div> </content> #should return "<div>Text inside tag</div> <!--2--> <content> Text with no tag </content> #should return "Text with no tag" <!--3--> <content> Text outside tag <div>Text inside tag</div> </content> #should

get errors when import lxml.etree to python

喜你入骨 提交于 2019-11-26 17:06:47
问题 i have installed an lxml on my mac, when i type in python like this localhost:lxml-3.0.1 apple$ python Python 2.7.3 (v2.7.3:70274d53c1dd, Apr 9 2012, 20:52:43) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from lxml import etree Traceback (most recent call last): File "", line 1, in ImportError: dlopen(/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/lxml-3.0.1-py2.7-macosx-10.6

parsing XML file gets UnicodeEncodeError (ElementTree) / ValueError (lxml)

六月ゝ 毕业季﹏ 提交于 2019-11-26 16:56:12
问题 I send a GET request to the CareerBuilder API : import requests url = "http://api.careerbuilder.com/v1/jobsearch" payload = {'DeveloperKey': 'MY_DEVLOPER_KEY', 'JobTitle': 'Biologist'} r = requests.get(url, params=payload) xml = r.text And get back an XML that looks like this. However, I have trouble parsing it. Using either lxml >>> from lxml import etree >>> print etree.fromstring(xml) Traceback (most recent call last): File "<pyshell#4>", line 1, in <module> print etree.fromstring(xml)

Why does this xpath fail using lxml in python?

倾然丶 夕夏残阳落幕 提交于 2019-11-26 16:46:59
问题 Here is an example web page I am trying to get data from. http://www.makospearguns.com/product-p/mcffgb.htm The xpath was taken from chrome development tools, and firepath in firefox is also able to find it, but using lxml it just returns an empty list for 'text'. from lxml import html import requests site_url = 'http://www.makospearguns.com/product-p/mcffgb.htm' xpath = '//*[@id="v65-product-parent"]/tbody/tr[2]/td[2]/table[1]/tbody/tr/td/table/tbody/tr[2]/td[2]/table/tbody/tr[1]/td[1]/div

Installing lxml for Python 3.4 on Windows x 86 (32 bit) with Visual Studio C++ 2010 Express

一个人想着一个人 提交于 2019-11-26 16:36:36
问题 Related Related questions: error: Unable to find vcvarsall.bat LXML 3.3 with Python 3.3 on windows 7 32-bit Related answers: https://stackoverflow.com/a/18045219/1175496 Related comments: Building lxml for Python 2.7 on Windows "@ziyuang This would mean you use Python 3.3 which uses Microsoft Visual Studio 2010. If that's the case then the answer is yes, you should install this version." Facts Windows x86 (32-bit) Installed both Visual Studio C++ 2008 (from here) Express and Visual Studio C++

Why doesn't xpath work when processing an XHTML document with lxml (in python)?

情到浓时终转凉″ 提交于 2019-11-26 16:33:32
问题 I am testing against the following test document: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>hi there</title> </head> <body> <img class="foo" src="bar.png"/> </body> </html> If I parse the document using lxml.html, I can get the IMG with an xpath just fine: >>> root = lxml.html.fromstring(doc) >>> root.xpath("//img") [<Element img

Windows7(32位)安装爬虫利器Scrapy小结

痴心易碎 提交于 2019-11-26 16:13:05
Windows7(32位)安装爬虫利器Scrapy小结 安装环境说明 我家中笔记本的环境非常简单: Windows 7旗舰版32位(非SP1) Python 3.4.4(非SP1的WIN7下无法安装版本3.5+) Scrapy安装 Scrapy重度依赖于lxml和twisted两个框架。这也正是问题所在。开源环境下工具的安装往往需要我们事先确定要安装的工具库依赖于哪些第三方库及其相关的依赖。如果有一个依赖安装要求满足不了,则安装失败。因此,安装前要做好必要的调查分析。 试验安装Scrapy失败 最开始时,我在DOS命令行下使用如下命令安装,但是失败了: pip install scrapy 运行上述命令时,pip会默认从https://files.pythonhosted.org网站上下载并安装最新版本的scrapy库,当然它会自动分析当前系统中已经安装的python版本。尽管如此,其所依赖的其他第三方库并不会作严格检查,结果会导致整个安装仅查最后的百分之几却是以失败结果而告终。 安装lxml 绝大多数网站上推荐的Lxml安装思路是从Python第三方库的网站http://www.lfd.uci.edu/~gohlke/pythonlibs/上下载编译好的.whl压缩文件,如下图: 但是,很遗憾,当我现在跳转到此网站找Python 3.4对应版本的.WHL文件时,早已不存在了。但是

Installing lxml with pip in virtualenv Ubuntu 12.10 error: command &#39;gcc&#39; failed with exit status 4

て烟熏妆下的殇ゞ 提交于 2019-11-26 15:52:57
问题 I'm having the following error when trying to run "pip install lxml" into a virtualenv in Ubuntu 12.10 x64. I have Python 2.7. I have seen other related questions here about the same problem and tried installing python-dev, libxml2-dev and libxslt1-dev. Please take a look of the traceback from the moment I tip the command to the moment when the error occurs. Downloading/unpacking lxml Running setup.py egg_info for package lxml /usr/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown