urllib2 | 易学教程

BOM in server response screws up json parsing

阅读更多关于 BOM in server response screws up json parsing

问题 I'm trying to write a Python script that posts some JSON to a web server and gets some JSON back. I patched together a few different examples on StackOverflow, and I think I have something that's mostly working. import urllib2 import json url = "http://foo.com/API.svc/SomeMethod" payload = json.dumps( {'inputs': ['red', 'blue', 'green']} ) headers = {"Content-type": "application/json;"} req = urllib2.Request(url, payload, headers) f = urllib2.urlopen(req) response = f.read() f.close() data =

Downloading a web page and all of its resource files in Python

阅读更多关于 Downloading a web page and all of its resource files in Python

问题 I want to be able to download a page and all of its associated resources (images, style sheets, script files, etc) using Python. I am (somewhat) familiar with urllib2 and know how to download individual urls, but before I go and start hacking at BeautifulSoup + urllib2 I wanted to be sure that there wasn't already a Python equivalent to "wget --page-requisites http://www.google.com". Specifically I am interested in gathering statistical information about how long it takes to download an

urllib2.urlopen() vs urllib.urlopen() - urllib2 throws 404 while urllib works! WHY?

阅读更多关于 urllib2.urlopen() vs urllib.urlopen() - urllib2 throws 404 while urllib works! WHY?

问题 import urllib print urllib.urlopen('http://www.reefgeek.com/equipment/Controllers_&_Monitors/Neptune_Systems_AquaController/Apex_Controller_&_Accessories/').read() The above script works and returns the expected results while: import urllib2 print urllib2.urlopen('http://www.reefgeek.com/equipment/Controllers_&_Monitors/Neptune_Systems_AquaController/Apex_Controller_&_Accessories/').read() throws the following error: Traceback (most recent call last): File "<stdin>", line 1, in <module> File

AttributeError(“'str' object has no attribute 'read'”)

阅读更多关于 AttributeError(“'str' object has no attribute 'read'”)

问题 In Python I'm getting an error: Exception: (<type 'exceptions.AttributeError'>, AttributeError("'str' object has no attribute 'read'",), <traceback object at 0x1543ab8>) Given python code: def getEntries (self, sub): url = 'http://www.reddit.com/' if (sub != ''): url += 'r/' + sub request = urllib2.Request (url + '.json', None, {'User-Agent' : 'Reddit desktop client by /user/RobinJ1995/'}) response = urllib2.urlopen (request) jsonofabitch = response.read () return json.load (jsonofabitch)[

urllib与urllib2的区别

阅读更多关于 urllib与urllib2的区别

在python中，urllib和urllib2不可相互替代的。整体来说，urllib2是urllib的增强，但是urllib中有urllib2中所没有的函数。 urllib2可以用urllib2.openurl中设置Request参数，来修改Header头。如果你访问一个网站，想更改User Agent （可以伪装你的浏览器），你就要用urllib2. urllib支持设置编码的函数，urllib.urlencode,在模拟登陆的时候，经常要post编码之后的参数，所以要想不使用第三方库完成模拟登录，你就需要使用urllib。 urllib一般和urllib2一起搭配使用相关阅读： urllib urllib2 同步地址: http://www.cnblogs.com/tiredoy/p/urllib_urllib2.html 来源： oschina 链接： https://my.oschina.net/u/558071/blog/144792

Python中urllib和urllib2库的使用

阅读更多关于 Python中urllib和urllib2库的使用

文章目录 urllib和urllib2库的基本使用 urlopen Request User-Agent 添加更多的Header信息 URL编码转换 urllib和urllib2的高级用法 Handler处理器和自定义Opener 简单的自定义opener() ProxyHandler处理器（代理设置） Cookie Cookie属性 Cookie应用 cookielib库和 HTTPCookieProcessor处理器 cookielib 库 1.获取Cookie，并保存到CookieJar()对象中 2. 访问网站获得cookie，并把获得的cookie保存在cookie文件中 3. 从文件中获取cookies，做为请求的一部分去访问案例：利用cookielib和post登录人人网异常错误处理 URLError HTTPError 改进版 HTTP响应状态码参考： urllib和urllib2库的基本使用所谓网页抓取，就是把URL地址中指定的网络资源从网络流中抓取出来。在Python中有很多库可以用来抓取网页，我们先学习 urllib2 。 urllib2 是 Python2.7 自带的模块(不需要下载，导入即可使用) urllib2 官方文档： https://docs.python.org/2/library/urllib2.html urllib2 源码：

I am downloading a file using Python urllib2. How do I check how large the file size is?

阅读更多关于 I am downloading a file using Python urllib2. How do I check how large the file size is?

And if it is large...then stop the download? I don't want to download files that are larger than 12MB. request = urllib2.Request(ep_url) request.add_header('User-Agent',random.choice(agents)) thefile = urllib2.urlopen(request).read() Andrew Dalke There's no need as bobince did and drop to httplib. You can do all that with urllib directly: >>> import urllib2 >>> f = urllib2.urlopen("http://dalkescientific.com") >>> f.headers.items() [('content-length', '7535'), ('accept-ranges', 'bytes'), ('server', 'Apache/2.2.14'), ('last-modified', 'Sun, 09 Mar 2008 00:27:43 GMT'), ('connection', 'close'), (

Timeout a file download with Python urllib?

阅读更多关于 Timeout a file download with Python urllib?

问题 Python beginner here. I want to be able to timeout my download of a video file if the process takes longer than 500 seconds. import urllib try: urllib.urlretrieve ("http://www.videoURL.mp4", "filename.mp4") except Exception as e: print("error") How do I amend my code to make that happen? 回答1: Better way is to use requests so you can stream the results and easily check for timeouts: import requests # Make the actual request, set the timeout for no data to 10 seconds and enable streaming

Python urllib2 HTTPBasicAuthHandler

阅读更多关于 Python urllib2 HTTPBasicAuthHandler

问题 Here is the code: import urllib2 as URL def get_unread_msgs(user, passwd): auth = URL.HTTPBasicAuthHandler() auth.add_password( realm='New mail feed', uri='https://mail.google.com', user='%s'%user, passwd=passwd ) opener = URL.build_opener(auth) URL.install_opener(opener) try: feed= URL.urlopen('https://mail.google.com/mail/feed/atom') return feed.read() except: return None It works just fine. The only problem is that when a wrong username or password is used, it takes forever to open to url

Python: Disable http_proxy in urllib2

阅读更多关于 Python: Disable http_proxy in urllib2

问题 I am using a proxy set as an environment variable (export http_proxy=example.com). For one call using urllib2 I need to temporarily disable this, ie. unset the http_proxy. I have tried various methods suggested in the documentation and interwebs, but so far have been unable to unset the proxy. So far I have tried: # doesn't work req = urllib2.Request('http://www.google.com') req.set_proxy(None,None) urllib2.urlopen(req) # also doesn't work urllib.getproxies = lambda x = None: {} 回答1: The