lxml

ImportError: No module named lxml.etree

自闭症网瘾萝莉.ら 提交于 2019-12-19 20:15:11
问题 I'm trying to import premailer in my project, but it keeps failing at the etree import. I installed the 2.7 binary for lxml. The lxml module imports fine, and it's showing the correct path to the library folder if I log the lxml module, but I can't import etree from it. There's an etree.pyd in the lxml folder but python can't seem to see\read it. I'm on windows7 64bit. Does anyone know what's going wrong here? 回答1: Try adding the library to the GAE .yaml file. Under libraries add -name: lxml

Getting more granular diffs from difflib (or a way to post-process a diff to achieve the same thing)

一曲冷凌霜 提交于 2019-12-19 11:26:55
问题 Downloading this page and making a minor edit to it, changing the first 65 in this paragraph to 68 : I then parse both sources with BeauifulSoup and diff them with difflib . url = 'https://secure.ssa.gov/apps10/reference.nsf/links/02092016062645AM' response = urllib2.urlopen(url) content = response.read() # get response as list of lines url2 = 'file:///Users/Pyderman/projects/temp/02092016062645AM-modified.html' response2 = urllib2.urlopen(url2) content2 = response2.read() # get response as

lxml classic: Get text content except for that of nested tags?

心已入冬 提交于 2019-12-19 09:47:56
问题 This must be an absolute classic, but I can't find the answer here. I'm parsing the following tag with lxml cssselect: <li><a href="/stations/1"><span class="num">3</span> Detroit</a></li> I want to get the content of the <li> tag without the content of the <span> tag. Currently I have: stop_list = doc.cssselect('ol#stations li a') start = stop_list[0].text_content().strip() But that gives me 3 Detroit . How can I just get Detroit ? 回答1: itertext method of an element returns an iterator of

How to solve problem with parsing html file with cyrillic symbol?

这一生的挚爱 提交于 2019-12-19 09:37:12
问题 I have some html file with span elements: <html> <body> <span class="one">Text</span>some text</br> <span class="two">Привет</span>Текст на русском</br> </body> </html> To get "some text" : # -*- coding:cp1251 -*- import lxml from lxml import html filename = "t.html" fread = open(filename, 'r') source = fread.read() tree = html.fromstring(source) fread.close() tags = tree.xpath('//span[@class="one" and text()="Text"]') #This OK print "name: ",tags[0].text print "value: ",tags[0].tail tags =

How to properly escape single and double quotes

北城余情 提交于 2019-12-19 09:16:22
问题 I have a lxml etree HTMLParser object that I'm trying to build xpaths with to assert xpaths, attributes of the xpath and text of that tag. I ran into a problem when the text of the tag has either single-quotes(') or double-quotes(") and I've exhausted all my options. Here's a sample object I created parser = etree.HTMLParser() tree = etree.parse(StringIO(<html><body><p align="center">Here is my 'test' "string"</p></body></html>), parser) Here is the snippet of code and then different

Why am I getting this ImportError?

这一生的挚爱 提交于 2019-12-19 09:11:17
问题 I have a tkinter app that I am compiling to an .exe via py2exe . In the setup file, I have set it to include lxml , urllib , lxml.html , ast , and math . When I run python setup.py py2exe in a CMD console, it compiles fine. I then go to the dist folder It has created, and run the .exe file. When I run the .exe I get this popup window. (source: gyazo.com) I then procede to open the Trader.exe.log file, and the the contents say the following; Traceback (most recent call last): File "Trader.py",

Automatic XSD validation

心不动则不痛 提交于 2019-12-19 06:01:42
问题 According to the lxml documentation "The DTD is retrieved automatically based on the DOCTYPE of the parsed document. All you have to do is use a parser that has DTD validation enabled." http://lxml.de/validation.html#validation-at-parse-time However, if you want to validate against an XML schema, you need to explicitly reference one. I am wondering why this is and would like to know if there is a library or function that can do this. Or even an explanation of how to make this happen myself.

Parsing Source Code (Python) Approach: Beautiful Soup, lxml, html5lib difference?

蓝咒 提交于 2019-12-18 18:26:30
问题 I have a large HTML source code I would like to parse (~200,000) lines, and I'm fairly certain there is some poor formatting throughout. I've been researching some parsers, and it seems Beautiful Soup, lxml, html5lib are the most popular. From reading this website, it seems lxml is the most commonly used and fastest, while Beautiful Soup is slower but accounts for more errors and variation. I'm a little confused on the Beautiful Soup documentation, http://www.crummy.com/software/BeautifulSoup

How to install lxml in Python 3.4 on Windows machine

我的未来我决定 提交于 2019-12-18 18:04:10
问题 I've been spending hours on this. I'm new to Python and can't see what the solution may be. I have Python 3.4 and want to work with .docx , which requires lxml . The workflow I've done so far is: I go to the Python lxml package installer page, but it's quite confusing to know which version I need. I tried with several of them that contained the 34 numbers, both .exe and .tar . I also tried pip install lxml3.4.4 and pip install lxml 3.4.4 . None of them worked either. This is what the command

Need help installing lxml on os x 10.7

…衆ロ難τιáo~ 提交于 2019-12-18 17:29:30
问题 I have been struggling to be able to do from lxml import etree ( import lxml works fine by the way) The error is: ImportError: dlopen(/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site- packages/lxml/etree.so, 2): Symbol not found: _htmlParseChunk Referenced from: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/lxml/etree.so Expected in: flat namespace in /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/lxml/etree.so