lxml | 易学教程

Can't install lxml on CentOS

阅读更多关于 Can't install lxml on CentOS

问题 I'm trying to install lxml but having some difficulties: [root@ip-xx-xxx-xx-113 init.d]# pip install lxml Downloading/unpacking lxml Running setup.py egg_info for package lxml /usr/lib64/python2.6/distutils/dist.py:266: UserWarning: Unknown distribution option: 'bugtrack_url' warnings.warn(msg) Building lxml version 3.3.0.beta2. Building without Cython. Using build configuration of libxslt 1.1.26 Building against libxml2/libxslt in the following directory: /usr/lib64 warning: no files found

ImportError After Install of lxml on OS X 10.6.8

阅读更多关于 ImportError After Install of lxml on OS X 10.6.8

问题 I tried installing lxml on OS X 10.6.8 by: Downloading the source from PyPi sudo python setup.py build --static-deps sudo python setup.py install It installed fine, without any errors. And when I go to into the Python REPL I get the following: Nabs$ python Python 2.7.2 (v2.7.2:8527427914a2, Jun 11 2011, 14:13:39) [GCC 4.0.1 (Apple Inc. build 5493)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import lxml >>> import lxml.html Traceback (most recent call

Python 2 v. 3 xpath

阅读更多关于 Python 2 v. 3 xpath

问题 This code returns one element under Python 2.7.9 and no elements under 3.4.3. Why? How do I fix it for Python 3? import requests from lxml import html page = requests.get('http://www.bloomberg.com/markets/rates-bonds/government-bonds/us/').text tree = html.fromstring(page) line = tree.xpath('//table[@class="std_table_module dual_border_data_table clear"][2]') print(line) 来源： https://stackoverflow.com/questions/28766281/python-2-v-3-xpath

Manage quotation marks in XPath (lxml)

阅读更多关于 Manage quotation marks in XPath (lxml)

问题 I want to extract web elements from the table 'MANUFACTURING AT A GLANCE' in the given website. But the name of the row has ' (single quote). This is interfering with my syntax. How do I overcome this issue? This code works for other rows. import requests from lxml import html, etree ism_pmi_url = 'https://www.instituteforsupplymanagement.org/ismreport/mfgrob.cfm?SSO=1' page = requests.get(ism_pmi_url) tree = html.fromstring(page.content) PMI_CustomerInventories = tree.xpath('//strong[text()=

Cannot properly display unicode string after parsing a file with lxml, works fine with simple file read

阅读更多关于 Cannot properly display unicode string after parsing a file with lxml, works fine with simple file read

问题 I'm attempting to use the lxml module to parse HTML files, but am struggling to get it to work with some UTF-8 encoded data. I'm using Python 2.7 on Windows. For example, consider a UTF-8 encoded file without byte order mark that contains nothing but the text string Québec . If I just read the contents of the file using a regular file handler and decode the resulting string object, I get a length 6 unicode string that looks good when written back to a file. But if I parse the file with lxml,

Using lxml for Python - Windows 7 64-bit

阅读更多关于 Using lxml for Python - Windows 7 64-bit

问题 When I try to install lxml, I get the following. I've tried downloading C++ redists and a whole bunch of other things I've found, but nothing works. I've tried everything from the following link: How to install lxml on Windows I've got python version 3.5.1. I Microsoft Windows [Version 6.1.7601] Copyright (c) 2009 Microsoft Corporation. All rights reserved. C:\Users\daniel.bak>pip install setuptools Requirement already satisfied (use --upgrade to upgrade): setuptools in c:\users\daniel.bak

lxml incorrectly parsing the Doctype while looking for links

阅读更多关于 lxml incorrectly parsing the Doctype while looking for links

问题 I've got a BeautifulSoup4 (4.2.1) parser which collects all href attributes from our template files, and until now it has been just perfect. But with lxml installed, one of our guys is now getting a; TypeError: string indices must be integers . I managed to replicate this on my Linux Mint VM and the only difference appears to be lxml so I assume when bs4 uses that html parser the issue occurs. The problem function is; def collecttemplateurls(templatedir, urlslist): """ Uses BeautifulSoup to

Passing around an ElementTree

阅读更多关于 Passing around an ElementTree

问题 In my program, I need to make use of an ElementTree object in various functions in my program. More specifically, I am doing this: tree = etree.parse('somefile.xml') I am passing this tree around in my program. I was wondering whether this is a good approach, or can I do this: Create a global tree (I come from a C++ background and I know global is bad) Create the tree again wherever required. Or is my approach ok? 回答1: In Python, (eliding complexities, making an analogy for your C++

IOError: Error reading file: failed to load HTTP resource, LXML error in Pythonanywhere

阅读更多关于 IOError: Error reading file: failed to load HTTP resource, LXML error in Pythonanywhere

问题 I am having a problem using lxml with python 2.7. I tried installing lxml version 3.4.0 and 3.4.2 but got the same error no idea why tho. Here is my python code: @app.route("/getInformation", methods=['GET']) def domain(): urlList = [] urlList.append("http://gbgfotboll.se/serier/?scr=table&ftid=57109") urlList.append("http://gbgfotboll.se/serier/?scr=table&ftid=57108") date = '2015-04-18' # use this in real mode: currentDate = (time.strftime("%Y-%m-%d")) homeScore = "0" awayScore = "0"

How can one replace an element in lxml?

阅读更多关于 How can one replace an element in lxml?

问题 I have a text that I get (data entered by users of CRM) web service, which returns a "terrifying format". I am filtering with python before using the data, but when it comes to removing line breaks (br) removed me also the texts. The code is as follows: description = ''' <div id="highlight" class="section"> <p> text............... </p> <br> <h1>TITLE</h1> <p>Multiple text <br> </p> <ul> <li>bad layer....</li> </ul> <p> <br>subTitle </p> <p> </p> <p style="text-align: center;"> <br>Text1 <br