lxml

Receiving 'ImportError: cannot import name etree' when using lxml in Python on Mac

霸气de小男生 提交于 2019-12-01 03:16:43
I'm having difficulty properly installing lxml for Python on Mac. I have followed the instructions here , which after installation indicates that the installation is successful (however, there are some warnings. The full log of the install and warnings can be found here ) After running the install, I am trying to run Test.py in the lxml install directory to ensure that it's working correctly. I am immediately prompted with the error: ImportError" cannot import name etree. This error results from the line from lxml import etree . I can't seem to figure out why it's failing here after a

Web page scraping gems/tools available in Ruby [closed]

北慕城南 提交于 2019-12-01 03:16:13
I'm trying to scrape web pages in a Ruby script that I'm working on. The purpose of the project is to show which ETFs and stock mutual funds are most compatible with the value investing philosophy. Some examples of pages I'd like to scrape are: http://finance.yahoo.com/q/pr?s=SPY+Profile http://finance.yahoo.com/q/hl?s=SPY+Holdings http://www.marketwatch.com/tools/mutual-fund/list/V What web scraping tools do you recommend for Ruby, and why? Keep in mind that there are thousands of stock funds out there, so any tool I use has to be reasonably quick. I am new to Ruby, but I have experience

Python爬虫零基础入门(系列)

北慕城南 提交于 2019-12-01 01:54:16
一、前言 上一篇演示了如何使用requests模块向网站发送http请求,获取到网页的HTML数据。这篇来演示如何使用BeautifulSoup模块来从HTML文本中提取我们想要的数据。 update on 2016-12-28:之前忘记给BeautifulSoup的官网了,今天补上,顺便再补点BeautifulSoup的用法。 update on 2017-08-16:很多网友留言说Unsplash网站改版了,很多内容是动态加载的。所以建议动态加载的内容使用PhantomJS而不是Request库进行请求,如果使用PhantomJS请看我的下一篇博客,如果是定位html文档使用的class等名字更改的话,建议大家根据更改后的内容进行定位,学爬虫重要的是爬取数据的逻辑,逻辑掌握了网站怎么变都不重要啦。 二、运行环境 我的运行环境如下: 系统版本 Windows10。 Python版本 Python3.5,推荐使用Anaconda 这个科学计算版本,主要是因为它自带一个包管理工具,可以解决有些包安装错误的问题。去官网,选择Python3.5版本,然后下载安装。 IDE 我使用的是PyCharm,是专门为Python开发的IDE。这是JetBrians的产品。 三、模块安装 BeautifulSoup 有多个版本,我们使用BeautifulSoup4。详细使用看官方文档。

lxml tag name with a “:”

↘锁芯ラ 提交于 2019-12-01 01:29:38
问题 I am trying to create an xml tree from a JSON object using lxml.etree. Some of the tagnames contin a colon in them something like :- 'settings:current' I tried using '{settings}current' as the tag name but I get this :- ns0:current xmlns:ns0="settings" 回答1: Yes, first read and understand XML namespaces. Then use that to generate XML-tree with namespaces:u >>> MY_NAMESPACES={'settings': 'http://example.com/url-for-settings-namespace'} >>> e=etree.Element('{%s}current' % MY_NAMESPACES['settings

Using Python and lxml to validate XML against an external DTD

我们两清 提交于 2019-12-01 01:10:34
I'm trying to validate an XML file against an external DTD referenced in the doctype tag. Specifically: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE en-export SYSTEM "http://xml.evernote.com/pub/evernote-export3.dtd"> ...the rest of the document... I'm using Python 3.3 and the lxml module. From reading http://lxml.de/validation.html#validation-at-parse-time , I've thrown this together: enexFile = open(sys.argv[2], mode="rb") # sys.argv[2] is the path to an XML file in local storage. enexParser = etree.XMLParser(dtd_validation=True) enexTree = etree.parse(enexFile, enexParser) From what I

Web page scraping gems/tools available in Ruby [closed]

故事扮演 提交于 2019-12-01 00:33:26
问题 Closed . This question is opinion-based. It is not currently accepting answers. Want to improve this question? Update the question so it can be answered with facts and citations by editing this post. Closed 6 years ago . I'm trying to scrape web pages in a Ruby script that I'm working on. The purpose of the project is to show which ETFs and stock mutual funds are most compatible with the value investing philosophy. Some examples of pages I'd like to scrape are: http://finance.yahoo.com/q/pr?s

【python】windows Python3.6下scrapy框架的安装

折月煮酒 提交于 2019-11-30 23:52:18
pip install scrapy 命令安装,提示 Failed building wheel for Twisted Microsoft Visual C++ 14.0 is required... 解决: 1、直接使用pip install scrapy安装不成功可以安装whl格式的包 首先下载scrapy的whl包 下载地址: http://www.lfd.uci.edu/~gohlke/pythonlibs/ 在网页中搜索scrapy找到 Scrapy‑1.3.3‑py2.py3‑none‑any.whl 下载了scrapy的whl包先不要着急安装,接着 2、安装whl格式包需要安装wheel库 看到别人的博客上说可以直接使用pip install wheel安装wheel,由于我已经安装过wheel了,在这里就不用安装了,就没有测试怎么安装wheel。 没有安装过wheel库的请先安装。 3、scrapy依赖twiste,同样使用whl格式的包进行安装 还是进入 http://www.lfd.uci.edu/~gohlke/pythonlibs/,在网页中搜索twisted找到其对应的whl包并下载 Twisted‑17.1.0‑cp36‑cp36m‑win_amd64.whl 根据你的Python的版本选择合适的包,名称中间的cp36是python3.6的意思

lxml.html. Error reading file; Failed to load external entity

故事扮演 提交于 2019-11-30 22:40:42
I am trying to get a movie trailer url from YouTube using parsing with lxml.html: from lxml import html import lxml.html from lxml.etree import XPath def get_youtube_trailer(selected_movie): # Create the url for the YouTube query in order to find the movie trailer title = selected_movie t = {'search_query' : title + ' movie trailer'} query_youtube = urllib.urlencode(t) search_url_youtube = 'https://www.youtube.com/results?' + query_youtube # Define the XPath for the YouTube movie trailer link movie_trailer_xpath = XPath('//ol[@class="item-section"]/li[1]/div/div/div[2]/h3/a/@href') # Parse the

Python脚本打包为exe文件

岁酱吖の 提交于 2019-11-30 22:22:39
把Python脚本和所用到的库打包为exe文件,可以更方便的发布程序,避免使用程序的每个电脑都必须安装Python。 网上有不少相关介绍,但很少见到Python 3.x下打包Python脚本为exe的成功案例,笔者探索了一下,成功完成了任务,记录下来分享给需要的朋友。欢迎交流。 一、Python 3.1的打包办法 1、下载cx_Freeze。 http://sourceforge.net/projects/cx-freeze/files/ 根据自己的系统类型和Python版本下载合适的类型,我下载的是:cx_Freeze-4.1.2.win32-py3.1.msi。 这个工具目前最新版本是2010.1.6号的,还挺新的。 2、安装。 直接安装下载的安装包。 之后可以看到cxfreeze工具所在目录如下: 3、打包。 我要打包的是BlogPost.py和它依赖的模块。 A、准备工作。 a、去除代码中所有中文字符,包括注释。(指定编码的注释可以不去掉没有关系) b、 如果用到类似lxml这样的第三方库,可能会出现找不到_elementpath模块的错误。需要在某个.py文件中写上import _elementpath as DONTUSE,并且指定该模块的搜索路径。(我的该模块所在路径是:C:\Python25\Lib\site-packages\lxml \_elementpath

Parsing a large .bz2 file (40 GB) with lxml iterparse in python. Error that does not appear with uncompressed file

时光怂恿深爱的人放手 提交于 2019-11-30 19:51:05
I am trying to parse OpenStreetMap's planet.osm, compressed in bz2 format. Because it is already 41G, I don't want to decompress the file completely. So I figured out how to parse portions of the planet.osm file using bz2 and lxml, using the following code from lxml import etree as et from bz2 import BZ2File path = "where/my/fileis.osm.bz2" with BZ2File(path) as xml_file: parser = et.iterparse(xml_file, events=('end',)) for events, elem in parser: if elem.tag == "tag": continue if elem.tag == "node": (do something) ## Do some cleaning # Get rid of that element elem.clear() # Also eliminate now