urllib2

Tor doesn't work with urllib2

ⅰ亾dé卋堺 提交于 2019-12-04 05:33:04
I am trying to use tor for anonymous access through privoxy as a proxy using urllib2. System info: Ubuntu 14.04, recently upgraded from 13.10 through dist-upgrade. This is a piece of code I am using for test purposes: import urllib2 def req(url): proxy_support = urllib2.ProxyHandler({"http": "127.0.0.1:8118"}) opener = urllib2.build_opener(proxy_support) opener.addheaders = [('User-agent', 'Mozilla/5.0')] return opener.open(url).read() print req('https://check.torproject.org') The above outputs a page with a sorry, but you don't use Tor message. As for my configurations: /etc/tor/torrc

Why aren't persistent connections supported by URLLib2?

陌路散爱 提交于 2019-12-04 04:44:00
After scanning the urllib2 source, it seems that connections are automatically closed even if you do specify keep-alive. Why is this? As it is now I just use httplib for my persistent connections... but wonder why this is disabled (or maybe just ambiguous) in urllib2. It's a well-known limit of urllib2 (and urllib as well). IMHO the best attempt so far to fix it and make it right is Garry Bodsworth's coda_network for Python 2.6 or 2.7 -- replacement, patched versions of urllib2 (and some other modules) to support keep-alive (and a bunch of other smaller but quite welcome fixes). Steven You

302s and losing cookies with urllib2

我只是一个虾纸丫 提交于 2019-12-04 04:27:39
问题 I am using liburl2 with CookieJar / HTTPCookieProcessor in an attempt to simulate a login to a page to automate an upload. I've seen some questions and answers on this, but nothing which solves my problem. I am losing my cookie when I simulate the login which ends up at a 302 redirect. The 302 response is where the cookie gets set by the server, but urllib2 HTTPCookieProcessor does not seem to save the cookie during a redirect. I tried creating a HTTPRedirectHandler class to ignore the

How to use urllib2 to get a webpage using SSLv3 encryption

社会主义新天地 提交于 2019-12-04 03:57:12
问题 I'm using python 2.7 and I'd like to get the contents of a webpage that requires sslv3. Currently when I try to access the page I get the error SSL23_GET_SERVER_HELLO, and some searching on the web lead me to the following solution which fixes things in Python 3 urllib.request.install_opener(urllib.request.build_opener(urllib.request.HTTPSHandler(context=ssl.SSLContext(ssl.PROTOCOL_TLSv1)))) How can I get the same effect in python 2.7, as I can't seem to find the equivalent of the context

urllib2 basic authentication oddites

爱⌒轻易说出口 提交于 2019-12-04 03:55:01
I'm slamming my head against the wall with this one. I've been trying every example, reading every last bit I can find online about basic http authorization with urllib2, but I can not figure out what is causing my specific error. Adding to the frustration is that the code works for one page, and yet not for another. logging into www.mysite.com/adm goes absolutely smooth. It authenticates no problem. Yet if I change the address to 'http://mysite.com/adm/items.php?n=201105&c=200' I receive this error: <h4 align="center" class="teal">Add/Edit Items</h4> <p><strong>Client:</strong> </p><p><strong

Reading an Excel object retrieved using urllib2

穿精又带淫゛_ 提交于 2019-12-04 03:49:19
问题 I am getting an Excel file using urllib2 and saving into response below. I want to be able to process this excel file using xlrd or similar. I included some info below, let me know if I can provide more info. How can I have response object transformed into an object I can play with? response = <addinfourl at 199999998 whose fp = <socket._fileobject object at 0x100001010>> response.read() prints: '\xd0\xcf\x11\xe0...' Headers: Content-Type: application/vnd.ms-excel Transfer-Encoding: chunked

Why am I getting “'ResultSet' has no attribute 'findAll'” using BeautifulSoup in Python?

霸气de小男生 提交于 2019-12-04 03:43:49
So I am learning Python slowly, and am trying to make a simple function that will draw data from the high scores page of an online game. This is someone else's code that i rewrote into one function (which might be the problem), but I am getting this error. Here is the code: >>> from urllib2 import urlopen >>> from BeautifulSoup import BeautifulSoup >>> def create(el): source = urlopen(el).read() soup = BeautifulSoup(source) get_table = soup.find('table', {'id':'mini_player'}) get_rows = get_table.findAll('tr') text = ''.join(get_rows.findAll(text=True)) data = text.strip() return data >>>

Python - Easy way to scrape Google, download top N hits (entire .html documents) for given search?

我只是一个虾纸丫 提交于 2019-12-04 03:23:08
Is there an easy way to scrape Google and write the text (just the text) of the top N (say, 1000) .html (or whatever) documents for a given search? As an example, imagine searching for the phrase "big bad wolf" and downloading just the text from the top 1000 hits -- i.e., actually downloading the text from those 1000 web pages (but just those pages, not the entire site). I'm assuming this would use the urllib2 library? I use Python 3.1 if that helps. Mark Longair The official way to get results from Google programmatically is to use Google's Custom Search API . As icktoofay comments, other

Python urllib2: Cannot assign requested address

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-04 02:31:41
问题 I am sending thousands of requests using urllib2 with proxies. I have received many of the following error on execution: urlopen error [Errno 99] Cannot assign requested address I read here that it may be due to a socket already being bonded. Is that the case? Any suggestions on how to fix this? 回答1: Here is an answer to a similar looking question that I prepared earlier.... much earlier... Socket in use error when reusing sockets The error is different, but the underlying problem is probably

Close urllib2 connection

大兔子大兔子 提交于 2019-12-04 00:50:02
问题 I'm using urllib2 to load files from ftp- and http-servers. Some of the servers support only one connection per IP. The problem is, that urllib2 does not close the connection instantly. Look at the example-program. from urllib2 import urlopen from time import sleep url = 'ftp://user:pass@host/big_file.ext' def load_file(url): f = urlopen(url) loaded = 0 while True: data = f.read(1024) if data == '': break loaded += len(data) f.close() #sleep(1) print('loaded {0}'.format(loaded)) load_file(url