lxml

Can't install lxml on CentOS

时光怂恿深爱的人放手 提交于 2019-12-25 16:49:38
问题 I'm trying to install lxml but having some difficulties: [root@ip-xx-xxx-xx-113 init.d]# pip install lxml Downloading/unpacking lxml Running setup.py egg_info for package lxml /usr/lib64/python2.6/distutils/dist.py:266: UserWarning: Unknown distribution option: 'bugtrack_url' warnings.warn(msg) Building lxml version 3.3.0.beta2. Building without Cython. Using build configuration of libxslt 1.1.26 Building against libxml2/libxslt in the following directory: /usr/lib64 warning: no files found

ImportError After Install of lxml on OS X 10.6.8

↘锁芯ラ 提交于 2019-12-25 11:51:25
问题 I tried installing lxml on OS X 10.6.8 by: Downloading the source from PyPi sudo python setup.py build --static-deps sudo python setup.py install It installed fine, without any errors. And when I go to into the Python REPL I get the following: Nabs$ python Python 2.7.2 (v2.7.2:8527427914a2, Jun 11 2011, 14:13:39) [GCC 4.0.1 (Apple Inc. build 5493)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import lxml >>> import lxml.html Traceback (most recent call

Python 2 v. 3 xpath

岁酱吖の 提交于 2019-12-25 08:48:19
问题 This code returns one element under Python 2.7.9 and no elements under 3.4.3. Why? How do I fix it for Python 3? import requests from lxml import html page = requests.get('http://www.bloomberg.com/markets/rates-bonds/government-bonds/us/').text tree = html.fromstring(page) line = tree.xpath('//table[@class="std_table_module dual_border_data_table clear"][2]') print(line) 来源: https://stackoverflow.com/questions/28766281/python-2-v-3-xpath

Manage quotation marks in XPath (lxml)

和自甴很熟 提交于 2019-12-25 08:34:42
问题 I want to extract web elements from the table 'MANUFACTURING AT A GLANCE' in the given website. But the name of the row has ' (single quote). This is interfering with my syntax. How do I overcome this issue? This code works for other rows. import requests from lxml import html, etree ism_pmi_url = 'https://www.instituteforsupplymanagement.org/ismreport/mfgrob.cfm?SSO=1' page = requests.get(ism_pmi_url) tree = html.fromstring(page.content) PMI_CustomerInventories = tree.xpath('//strong[text()=

Cannot properly display unicode string after parsing a file with lxml, works fine with simple file read

大城市里の小女人 提交于 2019-12-25 08:20:04
问题 I'm attempting to use the lxml module to parse HTML files, but am struggling to get it to work with some UTF-8 encoded data. I'm using Python 2.7 on Windows. For example, consider a UTF-8 encoded file without byte order mark that contains nothing but the text string Québec . If I just read the contents of the file using a regular file handler and decode the resulting string object, I get a length 6 unicode string that looks good when written back to a file. But if I parse the file with lxml,

Using lxml for Python - Windows 7 64-bit

£可爱£侵袭症+ 提交于 2019-12-25 07:40:24
问题 When I try to install lxml, I get the following. I've tried downloading C++ redists and a whole bunch of other things I've found, but nothing works. I've tried everything from the following link: How to install lxml on Windows I've got python version 3.5.1. I Microsoft Windows [Version 6.1.7601] Copyright (c) 2009 Microsoft Corporation. All rights reserved. C:\Users\daniel.bak>pip install setuptools Requirement already satisfied (use --upgrade to upgrade): setuptools in c:\users\daniel.bak

lxml incorrectly parsing the Doctype while looking for links

孤者浪人 提交于 2019-12-25 07:09:23
问题 I've got a BeautifulSoup4 (4.2.1) parser which collects all href attributes from our template files, and until now it has been just perfect. But with lxml installed, one of our guys is now getting a; TypeError: string indices must be integers . I managed to replicate this on my Linux Mint VM and the only difference appears to be lxml so I assume when bs4 uses that html parser the issue occurs. The problem function is; def collecttemplateurls(templatedir, urlslist): """ Uses BeautifulSoup to

Passing around an ElementTree

你说的曾经没有我的故事 提交于 2019-12-25 04:47:10
问题 In my program, I need to make use of an ElementTree object in various functions in my program. More specifically, I am doing this: tree = etree.parse('somefile.xml') I am passing this tree around in my program. I was wondering whether this is a good approach, or can I do this: Create a global tree (I come from a C++ background and I know global is bad) Create the tree again wherever required. Or is my approach ok? 回答1: In Python, (eliding complexities, making an analogy for your C++

IOError: Error reading file: failed to load HTTP resource, LXML error in Pythonanywhere

Deadly 提交于 2019-12-25 03:19:11
问题 I am having a problem using lxml with python 2.7. I tried installing lxml version 3.4.0 and 3.4.2 but got the same error no idea why tho. Here is my python code: @app.route("/getInformation", methods=['GET']) def domain(): urlList = [] urlList.append("http://gbgfotboll.se/serier/?scr=table&ftid=57109") urlList.append("http://gbgfotboll.se/serier/?scr=table&ftid=57108") date = '2015-04-18' # use this in real mode: currentDate = (time.strftime("%Y-%m-%d")) homeScore = "0" awayScore = "0"

How can one replace an element in lxml?

馋奶兔 提交于 2019-12-25 02:55:22
问题 I have a text that I get (data entered by users of CRM) web service, which returns a "terrifying format". I am filtering with python before using the data, but when it comes to removing line breaks (br) removed me also the texts. The code is as follows: description = ''' <div id="highlight" class="section"> <p> text............... </p> <br> <h1>TITLE</h1> <p>Multiple text <br>  </p> <ul> <li>bad layer....</li> </ul> <p> <br>subTitle </p> <p> </p> <p style="text-align: center;"> <br>Text1 <br