lxml | 易学教程

ImportError: No module named lxml.etree

阅读更多关于 ImportError: No module named lxml.etree

问题 I'm trying to import premailer in my project, but it keeps failing at the etree import. I installed the 2.7 binary for lxml. The lxml module imports fine, and it's showing the correct path to the library folder if I log the lxml module, but I can't import etree from it. There's an etree.pyd in the lxml folder but python can't seem to see\read it. I'm on windows7 64bit. Does anyone know what's going wrong here? 回答1: Try adding the library to the GAE .yaml file. Under libraries add -name: lxml

Getting more granular diffs from difflib (or a way to post-process a diff to achieve the same thing)

阅读更多关于 Getting more granular diffs from difflib (or a way to post-process a diff to achieve the same thing)

问题 Downloading this page and making a minor edit to it, changing the first 65 in this paragraph to 68 : I then parse both sources with BeauifulSoup and diff them with difflib . url = 'https://secure.ssa.gov/apps10/reference.nsf/links/02092016062645AM' response = urllib2.urlopen(url) content = response.read() # get response as list of lines url2 = 'file:///Users/Pyderman/projects/temp/02092016062645AM-modified.html' response2 = urllib2.urlopen(url2) content2 = response2.read() # get response as

lxml classic: Get text content except for that of nested tags?

阅读更多关于 lxml classic: Get text content except for that of nested tags?

问题 This must be an absolute classic, but I can't find the answer here. I'm parsing the following tag with lxml cssselect: <li><a href="/stations/1">3 Detroit</a></li> I want to get the content of the <li> tag without the content of the tag. Currently I have: stop_list = doc.cssselect('ol#stations li a') start = stop_list[0].text_content().strip() But that gives me 3 Detroit . How can I just get Detroit ? 回答1: itertext method of an element returns an iterator of

How to solve problem with parsing html file with cyrillic symbol?

阅读更多关于 How to solve problem with parsing html file with cyrillic symbol?

问题 I have some html file with span elements: <html> <body> Textsome text ПриветТекст на русском </body> </html> To get "some text" : # -*- coding:cp1251 -*- import lxml from lxml import html filename = "t.html" fread = open(filename, 'r') source = fread.read() tree = html.fromstring(source) fread.close() tags = tree.xpath('//span[@class="one" and text()="Text"]') #This OK print "name: ",tags[0].text print "value: ",tags[0].tail tags =

How to properly escape single and double quotes

阅读更多关于 How to properly escape single and double quotes

问题 I have a lxml etree HTMLParser object that I'm trying to build xpaths with to assert xpaths, attributes of the xpath and text of that tag. I ran into a problem when the text of the tag has either single-quotes(') or double-quotes(") and I've exhausted all my options. Here's a sample object I created parser = etree.HTMLParser() tree = etree.parse(StringIO(<html><body>Here is my 'test' "string"</body></html>), parser) Here is the snippet of code and then different

Why am I getting this ImportError?

阅读更多关于 Why am I getting this ImportError?

问题 I have a tkinter app that I am compiling to an .exe via py2exe . In the setup file, I have set it to include lxml , urllib , lxml.html , ast , and math . When I run python setup.py py2exe in a CMD console, it compiles fine. I then go to the dist folder It has created, and run the .exe file. When I run the .exe I get this popup window. (source: gyazo.com) I then procede to open the Trader.exe.log file, and the the contents say the following; Traceback (most recent call last): File "Trader.py",

Automatic XSD validation

阅读更多关于 Automatic XSD validation

问题 According to the lxml documentation "The DTD is retrieved automatically based on the DOCTYPE of the parsed document. All you have to do is use a parser that has DTD validation enabled." http://lxml.de/validation.html#validation-at-parse-time However, if you want to validate against an XML schema, you need to explicitly reference one. I am wondering why this is and would like to know if there is a library or function that can do this. Or even an explanation of how to make this happen myself.

Parsing Source Code (Python) Approach: Beautiful Soup, lxml, html5lib difference?

阅读更多关于 Parsing Source Code (Python) Approach: Beautiful Soup, lxml, html5lib difference?

问题 I have a large HTML source code I would like to parse (~200,000) lines, and I'm fairly certain there is some poor formatting throughout. I've been researching some parsers, and it seems Beautiful Soup, lxml, html5lib are the most popular. From reading this website, it seems lxml is the most commonly used and fastest, while Beautiful Soup is slower but accounts for more errors and variation. I'm a little confused on the Beautiful Soup documentation, http://www.crummy.com/software/BeautifulSoup

How to install lxml in Python 3.4 on Windows machine

阅读更多关于 How to install lxml in Python 3.4 on Windows machine

问题 I've been spending hours on this. I'm new to Python and can't see what the solution may be. I have Python 3.4 and want to work with .docx , which requires lxml . The workflow I've done so far is: I go to the Python lxml package installer page, but it's quite confusing to know which version I need. I tried with several of them that contained the 34 numbers, both .exe and .tar . I also tried pip install lxml3.4.4 and pip install lxml 3.4.4 . None of them worked either. This is what the command

Need help installing lxml on os x 10.7

阅读更多关于 Need help installing lxml on os x 10.7

问题 I have been struggling to be able to do from lxml import etree ( import lxml works fine by the way) The error is: ImportError: dlopen(/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site- packages/lxml/etree.so, 2): Symbol not found: _htmlParseChunk Referenced from: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/lxml/etree.so Expected in: flat namespace in /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/lxml/etree.so