urllib

Python urllib.request.urlopen() returning error 10061?

亡梦爱人 提交于 2019-11-29 07:37:42
I'm trying to download the HTML of a page ( http://www.google.com in this case) but I'm getting back an error. Here is my interactive prompt session: Python 3.2.2 (default, Sep 4 2011, 09:51:08) [MSC v.1500 32 bit (Intel)] on win 32 Type "help", "copyright", "credits" or "license" for more information. >>> import urllib >>> import urllib.request >>> html = urllib.request.urlopen("http://www.google.com") Traceback (most recent call last): File "\\****.****.org\myhome\python\lib\urllib\request.py", line 1136, in do_open h.request(req.get_method(), req.selector, req.data, headers) File "\\****.**

Python 3: AttributeError: 'module' object has no attribute '__path__' using urllib in terminal

淺唱寂寞╮ 提交于 2019-11-29 06:12:35
My code is runnning perfectly in PyCharm, but I have error messages while trying to open it in terminal. What's wrong with my code, or where I made mistakes? import urllib.request with urllib.request.urlopen('http://python.org/') as response: html = response.read() print(html) Output from terminal: λ python Desktop\url1.py Traceback (most recent call last): File "<frozen importlib._bootstrap>", line 2218, in _find_and_load_unlocked AttributeError: 'module' object has no attribute '__path__' During handling of the above exception, another exception occurred: Traceback (most recent call last):

web爬虫讲解—urllib库中使用xpath表达式—BeautifulSoup基础

↘锁芯ラ 提交于 2019-11-29 05:31:10
在urllib中,我们一样可以使用xpath表达式进行信息提取,此时,你需要首先安装lxml模块,然后将网页数据通过lxml下的etree转化为treedata的形式 urllib库中使用xpath表达式 etree.HTML()将获取到的html字符串,转换成树形结构,也就是xpath表达式可以获取的格式 #!/usr/bin/env python # -*- coding:utf8 -*- import urllib.request from lxml import etree #导入html树形结构转换模块 wye = urllib.request.urlopen('http://sh.qihoo.com/pc/home').read().decode("utf-8",'ignore') zhuanh = etree.HTML(wye) #将获取到的html字符串,转换成树形结构,也就是xpath表达式可以获取的格式 print(zhuanh) hqq = zhuanh.xpath('/html/head/title/text()') #通过xpath表达式获取标题 #注意,xpath表达式获取到数据,有时候是列表,有时候不是列表所以要做如下处理 if str(type(hqq)) == "<class 'list'>": #判断获取到的是否是列表 print(hqq)

Python; urllib error: AttributeError: 'bytes' object has no attribute 'read'

你说的曾经没有我的故事 提交于 2019-11-29 05:11:04
问题 Note: This is Python 3, there is no urllib2. Also, I've tried using json.loads(), and I get this error: TypeError: can't use a string pattern on a bytes-like object I get this error if I use json.loads() and remove the .read() from response: TypeError: expected string or buffer > import urllib.request import json response = urllib.request.urlopen('http://www.reddit.com/r/all/top/.json').read() jsonResponse = json.load(response) for child in jsonResponse['data']['children']: print (child['data

Download file using urllib in Python with the wget -c feature

谁说胖子不能爱 提交于 2019-11-29 04:25:29
I am programming a software in Python to download HTTP PDF from a database. Sometimes the download stop with this message : retrieval incomplete: got only 3617232 out of 10689634 bytes How can I ask the download to restart where it stops using the 206 Partial Content HTTP feature ? I can do it using wget -c and it works pretty well, but I would like to implement it directly in my Python software. Any idea ? Thank you You can request a partial download by sending a GET with the Range header: import urllib2 req = urllib2.Request('http://www.python.org/') # # Here we request that bytes 18000-

Difference between Python urllib.urlretrieve() and wget

徘徊边缘 提交于 2019-11-29 01:26:14
问题 I am trying to retrieve a 500mb file using Python, and I have a script which uses urllib.urlretrieve() . There seems to some network problem between me and the download site, as this call consistently hangs and fails to complete. However, using wget to retrieve the file tends to work without problems. What is the difference between urlretrieve() and wget that could cause this difference? 回答1: The answer is quite simple. Python's urllib and urllib2 are nowhere near as mature and robust as they

python爬虫从入门到放弃(三)之 Urllib库的基本使用

一个人想着一个人 提交于 2019-11-29 00:39:39
官方文档地址: https://docs.python.org/3/library/urllib.html 什么是Urllib Urllib是python内置的HTTP请求库 包括以下模块 urllib.request 请求模块 urllib.error 异常处理模块 urllib.parse url解析模块 urllib.robotparser robots.txt解析模块 urlopen 关于urllib.request.urlopen参数的介绍: urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None) url参数的使用 先写一个简单的例子: import urllib.request response = urllib.request.urlopen('http://www.baidu.com') print(response.read().decode('utf-8')) urlopen一般常用的有三个参数,它的参数如下: urllib.requeset.urlopen(url,data,timeout) response.read()可以获取到网页的内容,如果没有read(),将返回如下内容 data参数的使用

爬虫之Urllib库的基本使用

筅森魡賤 提交于 2019-11-29 00:39:21
官方文档地址: https://docs.python.org/3/library/urllib.html 什么是Urllib Urllib是python内置的HTTP请求库 包括以下模块 urllib.request 请求模块 urllib.error 异常处理模块 urllib.parse url解析模块 urllib.robotparser robots.txt解析模块 urlopen 关于urllib.request.urlopen参数的介绍: urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None) url参数的使用 先写一个简单的例子: import urllib.request response = urllib.request.urlopen('http://www.baidu.com') print(response.read().decode('utf-8')) urlopen一般常用的有三个参数,它的参数如下: urllib.requeset.urlopen(url,data,timeout) response.read()可以获取到网页的内容,如果没有read(),将返回如下内容 data参数的使用

python爬虫urllib库详解

橙三吉。 提交于 2019-11-29 00:38:41
什么是Urllib Urllib是python内置的HTTP请求库,中文文档如下: https://docs.python.org/3/library/urllib.html 包括以下模块 urllib.request 请求模块 urllib.error 异常处理模块 urllib.parse url解析模块 urllib.robotparser robots.txt解析模块 urlopen 关于urllib.request.urlopen参数的介绍: urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None) url参数的使用 先写一个简单的例子: import urllib.request response = urllib.request.urlopen('http://www.baidu.com') print(response.read().decode('utf-8')) urlopen一般常用的有三个参数,它的参数如下: urllib.requeset.urlopen(url,data,timeout) response.read()可以获取到网页的内容,如果没有read(),将返回如下内容 data参数的使用

Urllib库的基本使用

最后都变了- 提交于 2019-11-29 00:38:21
官方文档地址: https://docs.python.org/3/library/urllib.html 什么是Urllib Urllib是python内置的HTTP请求库 包括以下模块 urllib.request 请求模块 urllib.error 异常处理模块 urllib.parse url解析模块 urllib.robotparser robots.txt解析模块 urlopen 关于urllib.request.urlopen参数的介绍: urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None) url参数的使用 先写一个简单的例子: import urllib.request response = urllib.request.urlopen('http://www.baidu.com') print(response.read().decode('utf-8')) urlopen一般常用的有三个参数,它的参数如下: urllib.requeset.urlopen(url,data,timeout) response.read()可以获取到网页的内容,如果没有read(),将返回如下内容 data参数的使用