urllib2 | 易学教程

Tor doesn't work with urllib2

阅读更多关于 Tor doesn't work with urllib2

I am trying to use tor for anonymous access through privoxy as a proxy using urllib2. System info: Ubuntu 14.04, recently upgraded from 13.10 through dist-upgrade. This is a piece of code I am using for test purposes: import urllib2 def req(url): proxy_support = urllib2.ProxyHandler({"http": "127.0.0.1:8118"}) opener = urllib2.build_opener(proxy_support) opener.addheaders = [('User-agent', 'Mozilla/5.0')] return opener.open(url).read() print req('https://check.torproject.org') The above outputs a page with a sorry, but you don't use Tor message. As for my configurations: /etc/tor/torrc

Why aren't persistent connections supported by URLLib2?

阅读更多关于 Why aren't persistent connections supported by URLLib2?

After scanning the urllib2 source, it seems that connections are automatically closed even if you do specify keep-alive. Why is this? As it is now I just use httplib for my persistent connections... but wonder why this is disabled (or maybe just ambiguous) in urllib2. It's a well-known limit of urllib2 (and urllib as well). IMHO the best attempt so far to fix it and make it right is Garry Bodsworth's coda_network for Python 2.6 or 2.7 -- replacement, patched versions of urllib2 (and some other modules) to support keep-alive (and a bunch of other smaller but quite welcome fixes). Steven You

302s and losing cookies with urllib2

阅读更多关于 302s and losing cookies with urllib2

问题 I am using liburl2 with CookieJar / HTTPCookieProcessor in an attempt to simulate a login to a page to automate an upload. I've seen some questions and answers on this, but nothing which solves my problem. I am losing my cookie when I simulate the login which ends up at a 302 redirect. The 302 response is where the cookie gets set by the server, but urllib2 HTTPCookieProcessor does not seem to save the cookie during a redirect. I tried creating a HTTPRedirectHandler class to ignore the

How to use urllib2 to get a webpage using SSLv3 encryption

阅读更多关于 How to use urllib2 to get a webpage using SSLv3 encryption

问题 I'm using python 2.7 and I'd like to get the contents of a webpage that requires sslv3. Currently when I try to access the page I get the error SSL23_GET_SERVER_HELLO, and some searching on the web lead me to the following solution which fixes things in Python 3 urllib.request.install_opener(urllib.request.build_opener(urllib.request.HTTPSHandler(context=ssl.SSLContext(ssl.PROTOCOL_TLSv1)))) How can I get the same effect in python 2.7, as I can't seem to find the equivalent of the context

urllib2 basic authentication oddites

阅读更多关于 urllib2 basic authentication oddites

I'm slamming my head against the wall with this one. I've been trying every example, reading every last bit I can find online about basic http authorization with urllib2, but I can not figure out what is causing my specific error. Adding to the frustration is that the code works for one page, and yet not for another. logging into www.mysite.com/adm goes absolutely smooth. It authenticates no problem. Yet if I change the address to 'http://mysite.com/adm/items.php?n=201105&c=200' I receive this error: <h4 align="center" class="teal">Add/Edit Items</h4> <p><strong>Client:</strong> </p><p><strong

Reading an Excel object retrieved using urllib2

阅读更多关于 Reading an Excel object retrieved using urllib2

问题 I am getting an Excel file using urllib2 and saving into response below. I want to be able to process this excel file using xlrd or similar. I included some info below, let me know if I can provide more info. How can I have response object transformed into an object I can play with? response = <addinfourl at 199999998 whose fp = <socket._fileobject object at 0x100001010>> response.read() prints: '\xd0\xcf\x11\xe0...' Headers: Content-Type: application/vnd.ms-excel Transfer-Encoding: chunked

Why am I getting “'ResultSet' has no attribute 'findAll'” using BeautifulSoup in Python?

阅读更多关于 Why am I getting “'ResultSet' has no attribute 'findAll'” using BeautifulSoup in Python?

So I am learning Python slowly, and am trying to make a simple function that will draw data from the high scores page of an online game. This is someone else's code that i rewrote into one function (which might be the problem), but I am getting this error. Here is the code: >>> from urllib2 import urlopen >>> from BeautifulSoup import BeautifulSoup >>> def create(el): source = urlopen(el).read() soup = BeautifulSoup(source) get_table = soup.find('table', {'id':'mini_player'}) get_rows = get_table.findAll('tr') text = ''.join(get_rows.findAll(text=True)) data = text.strip() return data >>>

Python - Easy way to scrape Google, download top N hits (entire .html documents) for given search?

阅读更多关于 Python - Easy way to scrape Google, download top N hits (entire .html documents) for given search?

Is there an easy way to scrape Google and write the text (just the text) of the top N (say, 1000) .html (or whatever) documents for a given search? As an example, imagine searching for the phrase "big bad wolf" and downloading just the text from the top 1000 hits -- i.e., actually downloading the text from those 1000 web pages (but just those pages, not the entire site). I'm assuming this would use the urllib2 library? I use Python 3.1 if that helps. Mark Longair The official way to get results from Google programmatically is to use Google's Custom Search API . As icktoofay comments, other

Python urllib2: Cannot assign requested address

阅读更多关于 Python urllib2: Cannot assign requested address

问题 I am sending thousands of requests using urllib2 with proxies. I have received many of the following error on execution: urlopen error [Errno 99] Cannot assign requested address I read here that it may be due to a socket already being bonded. Is that the case? Any suggestions on how to fix this? 回答1: Here is an answer to a similar looking question that I prepared earlier.... much earlier... Socket in use error when reusing sockets The error is different, but the underlying problem is probably

Close urllib2 connection

阅读更多关于 Close urllib2 connection

问题 I'm using urllib2 to load files from ftp- and http-servers. Some of the servers support only one connection per IP. The problem is, that urllib2 does not close the connection instantly. Look at the example-program. from urllib2 import urlopen from time import sleep url = 'ftp://user:pass@host/big_file.ext' def load_file(url): f = urlopen(url) loaded = 0 while True: data = f.read(1024) if data == '': break loaded += len(data) f.close() #sleep(1) print('loaded {0}'.format(loaded)) load_file(url