urllib2

BOM in server response screws up json parsing

末鹿安然 提交于 2019-11-30 14:03:14
问题 I'm trying to write a Python script that posts some JSON to a web server and gets some JSON back. I patched together a few different examples on StackOverflow, and I think I have something that's mostly working. import urllib2 import json url = "http://foo.com/API.svc/SomeMethod" payload = json.dumps( {'inputs': ['red', 'blue', 'green']} ) headers = {"Content-type": "application/json;"} req = urllib2.Request(url, payload, headers) f = urllib2.urlopen(req) response = f.read() f.close() data =

Downloading a web page and all of its resource files in Python

南楼画角 提交于 2019-11-30 13:28:10
问题 I want to be able to download a page and all of its associated resources (images, style sheets, script files, etc) using Python. I am (somewhat) familiar with urllib2 and know how to download individual urls, but before I go and start hacking at BeautifulSoup + urllib2 I wanted to be sure that there wasn't already a Python equivalent to "wget --page-requisites http://www.google.com". Specifically I am interested in gathering statistical information about how long it takes to download an

urllib2.urlopen() vs urllib.urlopen() - urllib2 throws 404 while urllib works! WHY?

不想你离开。 提交于 2019-11-30 13:01:29
问题 import urllib print urllib.urlopen('http://www.reefgeek.com/equipment/Controllers_&_Monitors/Neptune_Systems_AquaController/Apex_Controller_&_Accessories/').read() The above script works and returns the expected results while: import urllib2 print urllib2.urlopen('http://www.reefgeek.com/equipment/Controllers_&_Monitors/Neptune_Systems_AquaController/Apex_Controller_&_Accessories/').read() throws the following error: Traceback (most recent call last): File "<stdin>", line 1, in <module> File

AttributeError(“'str' object has no attribute 'read'”)

六眼飞鱼酱① 提交于 2019-11-30 10:18:00
问题 In Python I'm getting an error: Exception: (<type 'exceptions.AttributeError'>, AttributeError("'str' object has no attribute 'read'",), <traceback object at 0x1543ab8>) Given python code: def getEntries (self, sub): url = 'http://www.reddit.com/' if (sub != ''): url += 'r/' + sub request = urllib2.Request (url + '.json', None, {'User-Agent' : 'Reddit desktop client by /user/RobinJ1995/'}) response = urllib2.urlopen (request) jsonofabitch = response.read () return json.load (jsonofabitch)[

urllib与urllib2的区别

心不动则不痛 提交于 2019-11-30 09:56:56
在python中,urllib和urllib2不可相互替代的。 整体来说,urllib2是urllib的增强,但是urllib中有urllib2中所没有的函数。 urllib2可以用urllib2.openurl中设置Request参数,来修改Header头。如果你访问一个网站,想 更改User Agent (可以伪装你的浏览器),你就要用urllib2. urllib支持设置编码的函数,urllib.urlencode,在模拟登陆的时候,经常要post编码之后的参数,所以要想不使用第三方库 完成模拟登录 ,你就需要使用urllib。 urllib一般和urllib2一起搭配使用 相关阅读: urllib urllib2 同步地址: http://www.cnblogs.com/tiredoy/p/urllib_urllib2.html 来源: oschina 链接: https://my.oschina.net/u/558071/blog/144792

Python中urllib和urllib2库的使用

谁都会走 提交于 2019-11-30 09:27:33
文章目录 urllib和urllib2库的基本使用 urlopen Request User-Agent 添加更多的Header信息 URL编码转换 urllib和urllib2的高级用法 Handler处理器 和 自定义Opener 简单的自定义opener() ProxyHandler处理器(代理设置) Cookie Cookie属性 Cookie应用 cookielib库 和 HTTPCookieProcessor处理器 cookielib 库 1.获取Cookie,并保存到CookieJar()对象中 2. 访问网站获得cookie,并把获得的cookie保存在cookie文件中 3. 从文件中获取cookies,做为请求的一部分去访问 案例:利用cookielib和post登录人人网 异常错误处理 URLError HTTPError 改进版 HTTP响应状态码参考: urllib和urllib2库的基本使用 所谓网页抓取,就是把URL地址中指定的网络资源从网络流中抓取出来。在Python中有很多库可以用来抓取网页,我们先学习 urllib2 。 urllib2 是 Python2.7 自带的模块(不需要下载,导入即可使用) urllib2 官方文档: https://docs.python.org/2/library/urllib2.html urllib2 源码:

I am downloading a file using Python urllib2. How do I check how large the file size is?

时光毁灭记忆、已成空白 提交于 2019-11-30 09:07:06
And if it is large...then stop the download? I don't want to download files that are larger than 12MB. request = urllib2.Request(ep_url) request.add_header('User-Agent',random.choice(agents)) thefile = urllib2.urlopen(request).read() Andrew Dalke There's no need as bobince did and drop to httplib. You can do all that with urllib directly: >>> import urllib2 >>> f = urllib2.urlopen("http://dalkescientific.com") >>> f.headers.items() [('content-length', '7535'), ('accept-ranges', 'bytes'), ('server', 'Apache/2.2.14'), ('last-modified', 'Sun, 09 Mar 2008 00:27:43 GMT'), ('connection', 'close'), (

Timeout a file download with Python urllib?

被刻印的时光 ゝ 提交于 2019-11-30 08:30:53
问题 Python beginner here. I want to be able to timeout my download of a video file if the process takes longer than 500 seconds. import urllib try: urllib.urlretrieve ("http://www.videoURL.mp4", "filename.mp4") except Exception as e: print("error") How do I amend my code to make that happen? 回答1: Better way is to use requests so you can stream the results and easily check for timeouts: import requests # Make the actual request, set the timeout for no data to 10 seconds and enable streaming

Python urllib2 HTTPBasicAuthHandler

ε祈祈猫儿з 提交于 2019-11-30 07:41:06
问题 Here is the code: import urllib2 as URL def get_unread_msgs(user, passwd): auth = URL.HTTPBasicAuthHandler() auth.add_password( realm='New mail feed', uri='https://mail.google.com', user='%s'%user, passwd=passwd ) opener = URL.build_opener(auth) URL.install_opener(opener) try: feed= URL.urlopen('https://mail.google.com/mail/feed/atom') return feed.read() except: return None It works just fine. The only problem is that when a wrong username or password is used, it takes forever to open to url

Python: Disable http_proxy in urllib2

喜夏-厌秋 提交于 2019-11-30 07:10:59
问题 I am using a proxy set as an environment variable (export http_proxy=example.com). For one call using urllib2 I need to temporarily disable this, ie. unset the http_proxy. I have tried various methods suggested in the documentation and interwebs, but so far have been unable to unset the proxy. So far I have tried: # doesn't work req = urllib2.Request('http://www.google.com') req.set_proxy(None,None) urllib2.urlopen(req) # also doesn't work urllib.getproxies = lambda x = None: {} 回答1: The