urllib2 | 易学教程

How to determine the IP address of the server after connecting with urllib2?

阅读更多关于 How to determine the IP address of the server after connecting with urllib2?

问题 I am downloading data from a server using urllib2. But I need to determine the IP address of the server to which I am connected. import urllib2 STD_HEADERS = {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9, */*;q=0.8', 'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.7', 'Accept-Language': 'en-us,en;q=0.5', 'User-Agent': 'Mozilla/5.0 (X11; U; Linux x86_64;en-US;rv:1.9.2.12) Gecko/20101028 Firefox/3.6.12'} request = urllib2.Request(url, None, STD_HEADERS) data = urllib2

How do I unit test a module that relies on urllib2?

阅读更多关于 How do I unit test a module that relies on urllib2?

I've got a piece of code that I can't figure out how to unit test! The module pulls content from external XML feeds (twitter, flickr, youtube, etc.) with urllib2. Here's some pseudo-code for it: params = (url, urlencode(data),) if data else (url,) req = Request(*params) response = urlopen(req) #check headers, content-length, etc... #parse the response XML with lxml... My first thought was to pickle the response and load it for testing, but apparently urllib's response object is unserializable (it raises an exception). Just saving the XML from the response body isn't ideal, because my code uses

Python unable to retrieve form with urllib or mechanize

阅读更多关于 Python unable to retrieve form with urllib or mechanize

问题 I'm trying to fill out and submit a form using Python, but I'm not able to retrieve the resulting page. I've tried both mechanize and urllib/urllib2 methods to post the form, but both run into problems. The form I'm trying to retrieve is here: http://zrs.leidenuniv.nl/ul/start.php. The page is in Dutch, but this is irrelevant to my problem. It may be noteworthy that the form action redirects to http://zrs.leidenuniv.nl/ul/query.php. First of all, this is the urllib/urllib2 method I've tried:

how enable requests async mode?

阅读更多关于 how enable requests async mode?

for this code: import sys import gevent from gevent import monkey monkey.patch_all() import requests import urllib2 def worker(url, use_urllib2=False): if use_urllib2: content = urllib2.urlopen(url).read().lower() else: content = requests.get(url, prefetch=True).content.lower() title = content.split('<title>')[1].split('</title>')[0].strip() urls = ['http://www.mail.ru']*5 def by_requests(): jobs = [gevent.spawn(worker, url) for url in urls] gevent.joinall(jobs) def by_urllib2(): jobs = [gevent.spawn(worker, url, True) for url in urls] gevent.joinall(jobs) if __name__=='__main__': from timeit

how to check if the urllib2 follow a redirect?

阅读更多关于 how to check if the urllib2 follow a redirect?

问题 I've write this function: def download_mp3(url,name): opener1 = urllib2.build_opener() page1 = opener1.open(url) mp3 = page1.read() filename = name+'.mp3' fout = open(filename, 'wb') fout.write(mp3) fout.close() This function take an url and a name both as string. Then will download and save an mp3 from the url with the name of the variable name. the url is in the form http://site/download.php?id=xxxx where xxxx is the id of an mp3 if this id does not exist the site redirects me to another

Python form POST using urllib2 (also question on saving/using cookies)

阅读更多关于 Python form POST using urllib2 (also question on saving/using cookies)

I am trying to write a function to post form data and save returned cookie info in a file so that the next time the page is visited, the cookie information is sent to the server (i.e. normal browser behavior). I wrote this relatively easily in C++ using curlib, but have spent almost an entire day trying to write this in Python, using urllib2 - and still no success. This is what I have so far: import urllib, urllib2 import logging # the path and filename to save your cookies in COOKIEFILE = 'cookies.lwp' cj = None ClientCookie = None cookielib = None logger = logging.getLogger(__name__) # Let's

How to retry urllib2.request when fails?

阅读更多关于 How to retry urllib2.request when fails?

When urllib2.request reaches timeout, a urllib2.URLError exception is raised. What is the pythonic way to retry establishing a connection? I would use a retry decorator. There are other ones out there, but this one works pretty well. Here's how you can use it: @retry(urllib2.URLError, tries=4, delay=3, backoff=2) def urlopen_with_retry(): return urllib2.urlopen("http://example.com") This will retry the function if URLError is raised. Check the link above for documentation on the parameters, but basically it will retry a maximum of 4 times, with an exponential backoff delay doubling each time,

Overriding urllib2.HTTPError or urllib.error.HTTPError and reading response HTML anyway

阅读更多关于 Overriding urllib2.HTTPError or urllib.error.HTTPError and reading response HTML anyway

I receive a 'HTTP Error 500: Internal Server Error' response, but I still want to read the data inside the error HTML. With Python 2.6, I normally fetch a page using: import urllib2 url = "http://google.com" data = urllib2.urlopen(url) data = data.read() When attempting to use this on the failing URL, I get the exception urllib2.HTTPError : urllib2.HTTPError: HTTP Error 500: Internal Server Error How can I fetch such error pages (with or without urllib2 ), all while they are returning Internal Server Errors? Note that with Python 3, the corresponding exception is urllib.error.HTTPError . Joe

Why does this url raise BadStatusLine with httplib2 and urllib2?

阅读更多关于 Why does this url raise BadStatusLine with httplib2 and urllib2?

问题 Using httplib2 and urllib2, I'm trying to fetch pages from this url, but all of them didn't work out and ended up with this exception. content = conn.request(uri="http://www.zdnet.co.kr/news/news_print.asp?artice_id=20110727092902") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.7/dist-packages/httplib2/__init__.py", line 1129, in request (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections,

How can I see all notes of a Tumblr post from Python?

阅读更多关于 How can I see all notes of a Tumblr post from Python?

问题 Say I look at the following Tumblr post: http://ronbarak.tumblr.com/post/40692813… It (currently) has 292 notes. I'd like to get all the above notes using a Python script (e.g., via urllib2, BeautifulSoup, simplejson, or tumblr Api). Some extensive Googling did not produce any items relating to notes' extraction in Tumblr. Can anyone point me in the right direction on which tool will enable me to do that? 回答1: Unfortunately looks like the Tumblr API has some limitations (lacks of meta