lxml

pip is not able to install packages correctly: Permission denied error [duplicate]

拈花ヽ惹草 提交于 2019-11-26 06:28:30
问题 This question already has answers here : Cannot install Lxml on Mac os x 10.9 (23 answers) django installation: cannot use pip to install django on linux(ubuntu) (3 answers) Closed 5 years ago . I am trying to install lxml to install scrapy on my Mac (v 10.9.4) ╭─ishaantaylor@Ishaans-MacBook-Pro.local ~ ╰─➤ pip install lxml Downloading/unpacking lxml Downloading lxml-3.4.0.tar.gz (3.5MB): 3.5MB downloaded Running setup.py (path:/private/var/folders/8l/t7tcq67d34v7qq_4hp3s1dm80000gn/T/pip

python接口自动化--lxml解析

耗尽温柔 提交于 2019-11-26 05:54:00
1 from lxml import etree 2 import urllib3 3 import requests 4 urllib3.disable_warnings() 5 url="https://www.cnblogs.com/mvc/blog/news.aspx?blogApp=xiaoyujuan" 6 7 r = requests.get(url,verify=False) 8 # print(r.text) 9 10 dom = etree.HTML(r.content.decode("utf-8")) 11 block = dom.xpath("//*[@id='profile_block']") 12 t = etree.tostring(block[0],encoding='utf-8',pretty_print=True) 13 print(t.decode("utf-8")) 14 15 t1 = block[0].xpath("text()")#获取当前节点文本元素 16 print(t1) 17 t2 = block[0].xpath('a')#定位a标签 18 for i,j in zip(t1,t2): 19 print("%s%s" %(i,j.text)) 1 from lxml import etree 2 htmldemo = '''

How to install lxml on Ubuntu

做~自己de王妃 提交于 2019-11-26 05:36:09
I'm having difficulty installing lxml with easy_install on Ubuntu 11. When I type $ easy_install lxml I get: Searching for lxml Reading http://pypi.python.org/simple/lxml/ Reading http://codespeak.net/lxml Best match: lxml 2.3 Downloading http://lxml.de/files/lxml-2.3.tgz Processing lxml-2.3.tgz Running lxml-2.3/setup.py -q bdist_egg --dist-dir /tmp/easy_install-7UdQOZ/lxml-2.3/egg-dist-tmp-GacQGy Building lxml version 2.3. Building without Cython. ERROR: /bin/sh: xslt-config: not found ** make sure the development packages of libxml2 and libxslt are installed ** Using build configuration of

Get all text inside a tag in lxml

廉价感情. 提交于 2019-11-26 05:22:11
问题 I\'d like to write a code snippet that would grab all of the text inside the <content> tag, in lxml, in all three instances below, including the code tags. I\'ve tried tostring(getchildren()) but that would miss the text in between the tags. I didn\'t have very much luck searching the API for a relevant function. Could you help me out? <!--1--> <content> <div>Text inside tag</div> </content> #should return \"<div>Text inside tag</div> <!--2--> <content> Text with no tag </content> #should

bs4.FeatureNotFound: Couldn&#39;t find a tree builder with the features you requested: lxml. Do you need to install a parser library?

空扰寡人 提交于 2019-11-26 05:17:12
问题 ... soup = BeautifulSoup(html, \"lxml\") File \"/Library/Python/2.7/site-packages/bs4/__init__.py\", line 152, in __init__ % \",\".join(features)) bs4.FeatureNotFound: Couldn\'t find a tree builder with the features you requested: lxml. Do you need to install a parser library? The above outputs on my Terminal. I am on Mac OS 10.7.x. I have Python 2.7.1, and followed this tutorial to get Beautiful Soup and lxml, which both installed successfully and work with a separate test file located here.

how to remove an element in lxml

孤人 提交于 2019-11-26 04:23:51
问题 I need to completely remove elements, based on the contents of an attribute, using python\'s lxml. Example: import lxml.etree as et xml=\"\"\" <groceries> <fruit state=\"rotten\">apple</fruit> <fruit state=\"fresh\">pear</fruit> <fruit state=\"fresh\">starfruit</fruit> <fruit state=\"rotten\">mango</fruit> <fruit state=\"fresh\">peach</fruit> </groceries> \"\"\" tree=et.fromstring(xml) for bad in tree.xpath(\"//fruit[@state=\\\'rotten\\\']\"): #remove this element from the tree print et

How to select following sibling/xml tag using xpath

我是研究僧i 提交于 2019-11-26 04:22:30
问题 I have an HTML file (from Newegg) and their HTML is organized like below. All of the data in their specifications table is \' desc \' while the titles of each section are in \' name. \' Below are two examples of data from Newegg pages. <tr> <td class=\"name\">Brand</td> <td class=\"desc\">Intel</td> </tr> <tr> <td class=\"name\">Series</td> <td class=\"desc\">Core i5</td> </tr> <tr> <td class=\"name\">Cores</td> <td class=\"desc\">4</td> </tr> <tr> <td class=\"name\">Socket</td> <td class=\

Cannot install Lxml on Mac os x 10.9

梦想的初衷 提交于 2019-11-26 03:29:03
问题 I want to install Lxml so I can then install Scrapy. When I updated my Mac today it wouldn\'t let me reinstall lxml, I get the following error: In file included from src/lxml/lxml.etree.c:314: /private/tmp/pip_build_root/lxml/src/lxml/includes/etree_defs.h:9:10: fatal error: \'libxml/xmlversion.h\' file not found #include \"libxml/xmlversion.h\" ^ 1 error generated. error: command \'cc\' failed with exit status 1 I have tried using brew to install libxml2 and libxslt, both installed fine but

Parsing HTML in python - lxml or BeautifulSoup? Which of these is better for what kinds of purposes?

主宰稳场 提交于 2019-11-26 03:07:03
问题 From what I can make out, the two main HTML parsing libraries in Python are lxml and BeautifulSoup. I\'ve chosen BeautifulSoup for a project I\'m working on, but I chose it for no particular reason other than finding the syntax a bit easier to learn and understand. But I see a lot of people seem to favour lxml and I\'ve heard that lxml is faster. So I\'m wondering what are the advantages of one over the other? When would I want to use lxml and when would I be better off using BeautifulSoup?

Using Python Iterparse For Large XML Files

那年仲夏 提交于 2019-11-26 02:08:01
问题 I need to write a parser in Python that can process some extremely large files ( > 2 GB ) on a computer without much memory (only 2 GB). I wanted to use iterparse in lxml to do it. My file is of the format: <item> <title>Item 1</title> <desc>Description 1</desc> </item> <item> <title>Item 2</title> <desc>Description 2</desc> </item> and so far my solution is: from lxml import etree context = etree.iterparse( MYFILE, tag=\'item\' ) for event, elem in context : print elem.xpath( \'description