urllib2 | 易学教程

python ssl eof occurred in violation of protocol, wantwriteerror, zeroreturnerror

阅读更多关于 python ssl eof occurred in violation of protocol, wantwriteerror, zeroreturnerror

问题 I'm running many celery tasks (20,000) using gevent for the pool (also monkey patching all). Each of these tasks hit 3rd party services like adwords to pull data. I keep having tasks fail because of underlying SSL errors. Below are the stack-traces from a few of the exceptions (in no particular order, these are failures from separate tasks). I also get WantWriteError and ZeroReturnError occasionally but the EOF error seems to come up the most. These errors happen while using different client

urllib2 connection timed out error

阅读更多关于 urllib2 connection timed out error

问题 I am trying to open a page using urllib2 but i keep getting connection timed out errors. The line which i am using is: f = urllib2.urlopen(url) exact error is: URLError: <urlopen error [Errno 110] Connection timed out> 回答1: urllib2 respects robots.txt. Many sites block the default User-Agent . Try adding a new User-Agent , by creating Request objects & using them as arguments for urlopen : import urllib2 request = urllib2.Request('http://www.example.com/') request.add_header('User-agent',

Proxy Authentication error in Urllib2 (Python 2.7)

阅读更多关于 Proxy Authentication error in Urllib2 (Python 2.7)

问题 [Windows 7 64 bit; Python 2.7] If I try to use Urllib2, I get this error Traceback (most recent call last): File "C:\Users\cYanide\Documents\Python Challenge\1\1.py", line 7, in <module> response = urllib2.urlopen('http://python.org/') File "C:\Python27\lib\urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout) File "C:\Python27\lib\urllib2.py", line 400, in open response = meth(req, response) File "C:\Python27\lib\urllib2.py", line 513, in http_response 'http', request,

Python fails Tor check using urllib2 to initiate requests

阅读更多关于 Python fails Tor check using urllib2 to initiate requests

问题 After reading through the other questions on StackOverflow, I got a snippet of Python code that is able to make requests through a Tor proxy: import urllib2 proxy = urllib2.ProxyHandler({'http':'127.0.0.1:8118'}) opener = urllib2.build_opener(proxy) print opener.open('https://check.torproject.org/').read() Since Tor works fine in Firefox with TorButton, I expected it to work fine in Python. Unfortunately, included in the mess of HTML: Sorry. You are not using Tor . I am not sure why this is

Python Beautifulsoup get_text() not getting all text

阅读更多关于 Python Beautifulsoup get_text() not getting all text

问题 I'm trying to get all text from a html tag using beautifulsoup get_text() method. I use Python 2.7 and Beautifulsoup 4.4.0. It works for most of the times. However, this method can only get first paragraph from a tag sometimes. I can't figure out why. Please see the following example. from bs4 import BeautifulSoup import urllib2 job_url = "http://www.indeed.com/viewjob?jk=0f5592c8191a21af" site = urllib2.urlopen(job_url).read() soup = BeautifulSoup(site, "html.parser") text = soup.find("span"

How to formally insert URL space (%20) using Python? [duplicate]

阅读更多关于 How to formally insert URL space (%20) using Python? [duplicate]

问题 This question already has answers here : How to urlencode a querystring in Python? (12 answers) Closed 4 years ago . When a mutli-word search term is entered in ebay, the resultant URL looks something like (for example "demarini cf5 cf12"): http://www.ebay.com/sch/i.html?_from=R40&_sacat=0&_nkw=demarini%20cf5%20cfc12 I wish to construct this URL in Python so it can be accessed directly. So it's case of concatenating the base URL: http://www.ebay.com/sch/i.html?_from=R40&_sacat=0&_nkw= ...

Python: Urllib2 and OpenCV

阅读更多关于 Python: Urllib2 and OpenCV

问题 I have a program that saves an image in a local directory and then reads the image from that directory. But I dont want to save the image. I want to read it directly from the url. Here's my code: import cv2.cv as cv import urllib2 url = "http://cache2.allpostersimages.com/p/LRG/18/1847/M5G8D00Z/posters/curious-cat.jpg" filename = "my_test_image" + url[-4:] print filename opener = urllib2.build_opener() page = opener.open(url) img= page.read() abc = open(filename, "wb") abc.write(img) abc

Sending multiple values for one name urllib2

阅读更多关于 Sending multiple values for one name urllib2

问题 im trying so submit a webpage that has checkboxes and i need up to 10 of these checkboxes checked the problem is when i try to assign them to one name in a dict it only assigns the last one not all 10 so how can i do this here is the request code: forms = {"_ref_ck": ref, "type": "create", "selected_items[]": sel_itms[0], "selected_items[]": sel_itms[1], "selected_items[]": sel_itms[2], "selected_items[]": sel_itms[3], "selected_items[]": sel_itms[4], "selected_items[]": sel_itms[5],

“WindowsError: [Error 5] Access is denied” using urllib2

阅读更多关于 “WindowsError: [Error 5] Access is denied” using urllib2

问题 I'm getting a "WindowsError: [Error 5] Access is denied" message when reading a website with urllib2. from urllib2 import urlopen, Request from bs4 import BeautifulSoup hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11'} req = Request('https://' + url, headers=hdr) soup = BeautifulSoup( urlopen( req ).read() ) The full traceback is: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:

reading a stream made by urllib2 never recovers when connection got interrupted

阅读更多关于 reading a stream made by urllib2 never recovers when connection got interrupted

问题 While trying to make one of my python applications a bit more robust in case of connection interruptions I discovered that calling the read function of an http-stream made by urllib2 may block the script forever. I thought that the read function will timeout and eventually raise an exception but this does not seam to be the case when the connection got interrupted during a read function call. Here is the code that will cause the problem: import urllib2 while True: try: stream = urllib2