urllib2

What should I do if socket.setdefaulttimeout() is not working?

大憨熊 提交于 2019-11-27 01:21:54
I'm writing a script(multi-threaded) to retrieve contents from a website, and the site's not very stable so every now and then there's hanging http request which cannot even be time-outed by socket.setdefaulttimeout() . Since I have no control over that website, the only thing I can do is to improve my codes but I'm running out of ideas right now. Sample codes: socket.setdefaulttimeout(150) MechBrowser = mechanize.Browser() Header = {'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8 GTB7.1 (.NET CLR 3.5.30729)'} Url = "http://example.com"

Multiple (asynchronous) connections with urllib2 or other http library?

僤鯓⒐⒋嵵緔 提交于 2019-11-27 01:13:22
I have code like this. for p in range(1,1000): result = False while result is False: ret = urllib2.Request('http://server/?'+str(p)) try: result = process(urllib2.urlopen(ret).read()) except (urllib2.HTTPError, urllib2.URLError): pass results.append(result) I would like to make two or three request at the same time to accelerate this. Can I use urllib2 for this, and how? If not which other library should I use? Thanks. You can use asynchronous IO to do this. requests + gevent = grequests GRequests allows you to use Requests with Gevent to make asynchronous HTTP Requests easily. import

Making a POST call instead of GET using urllib2

爱⌒轻易说出口 提交于 2019-11-27 00:55:28
There's a lot of stuff out there on urllib2 and POST calls, but I'm stuck on a problem. I'm trying to do a simple POST call to a service: url = 'http://myserver/post_service' data = urllib.urlencode({'name' : 'joe', 'age' : '10'}) content = urllib2.urlopen(url=url, data=data).read() print content I can see the server logs and it says that I'm doing GET calls, when I'm sending the data argument to urlopen. The library is raising an 404 error (not found), which is correct for a GET call, POST calls are processed well (I'm also trying with a POST within a HTML form). Gregg This may have been

Fetch a Wikipedia article with Python

不打扰是莪最后的温柔 提交于 2019-11-27 00:50:56
I try to fetch a Wikipedia article with Python's urllib: f = urllib.urlopen("http://en.wikipedia.org/w/index.php?title=Albert_Einstein&printable=yes") s = f.read() f.close() However instead of the html page I get the following response: Error - Wikimedia Foundation: Request: GET http://en.wikipedia.org/w/index.php?title=Albert_Einstein&printable=yes, from 192.35.17.11 via knsq1.knams.wikimedia.org (squid/2.6.STABLE21) to () Error: ERR_ACCESS_DENIED, errno [No Error] at Tue, 23 Sep 2008 09:09:08 GMT Wikipedia seems to block request which are not from a standard browser. Anybody know how to work

Spoofing the origination IP address of an HTTP request

妖精的绣舞 提交于 2019-11-27 00:44:48
问题 This only needs to work on a single subnet and is not for malicious use. I have a load testing tool written in Python that basically blasts HTTP requests at a URL. I need to run performance tests against an IP-based load balancer, so the requests must come from a range of IP's. Most commercial performance tools provide this functionality, but I want to build it into my own. The tool uses Python's urllib2 for transport. Is it possible to send HTTP requests with spoofed IP addresses for the

Python urllib over TOR? [duplicate]

爷,独闯天下 提交于 2019-11-27 00:42:48
This question already has an answer here: How to route urllib requests through the TOR network? [duplicate] 3 answers Sample code: #!/usr/bin/python import socks import socket import urllib2 socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS4, "127.0.0.1", 9050, True) socket.socket = socks.socksocket print urllib2.urlopen("http://almien.co.uk/m/tools/net/ip/").read() TOR is running a SOCKS proxy on port 9050 (its default). The request goes through TOR, surfacing at an IP address other than my own. However, TOR console gives the warning: "Feb 28 22:44:26.233 [warn] Your application (using socks4 to

how enable requests async mode?

亡梦爱人 提交于 2019-11-27 00:23:41
问题 for this code: import sys import gevent from gevent import monkey monkey.patch_all() import requests import urllib2 def worker(url, use_urllib2=False): if use_urllib2: content = urllib2.urlopen(url).read().lower() else: content = requests.get(url, prefetch=True).content.lower() title = content.split('<title>')[1].split('</title>')[0].strip() urls = ['http://www.mail.ru']*5 def by_requests(): jobs = [gevent.spawn(worker, url) for url in urls] gevent.joinall(jobs) def by_urllib2(): jobs =

Python form POST using urllib2 (also question on saving/using cookies)

半城伤御伤魂 提交于 2019-11-27 00:12:19
问题 I am trying to write a function to post form data and save returned cookie info in a file so that the next time the page is visited, the cookie information is sent to the server (i.e. normal browser behavior). I wrote this relatively easily in C++ using curlib, but have spent almost an entire day trying to write this in Python, using urllib2 - and still no success. This is what I have so far: import urllib, urllib2 import logging # the path and filename to save your cookies in COOKIEFILE =

Overriding urllib2.HTTPError or urllib.error.HTTPError and reading response HTML anyway

余生长醉 提交于 2019-11-27 00:11:44
问题 I receive a 'HTTP Error 500: Internal Server Error' response, but I still want to read the data inside the error HTML. With Python 2.6, I normally fetch a page using: import urllib2 url = "http://google.com" data = urllib2.urlopen(url) data = data.read() When attempting to use this on the failing URL, I get the exception urllib2.HTTPError : urllib2.HTTPError: HTTP Error 500: Internal Server Error How can I fetch such error pages (with or without urllib2 ), all while they are returning

How to retry urllib2.request when fails?

喜你入骨 提交于 2019-11-27 00:10:10
问题 When urllib2.request reaches timeout, a urllib2.URLError exception is raised. What is the pythonic way to retry establishing a connection? 回答1: I would use a retry decorator. There are other ones out there, but this one works pretty well. Here's how you can use it: @retry(urllib2.URLError, tries=4, delay=3, backoff=2) def urlopen_with_retry(): return urllib2.urlopen("http://example.com") This will retry the function if URLError is raised. Check the link above for documentation on the