urllib2

python ssl eof occurred in violation of protocol, wantwriteerror, zeroreturnerror

核能气质少年 提交于 2019-12-10 19:01:18
问题 I'm running many celery tasks (20,000) using gevent for the pool (also monkey patching all). Each of these tasks hit 3rd party services like adwords to pull data. I keep having tasks fail because of underlying SSL errors. Below are the stack-traces from a few of the exceptions (in no particular order, these are failures from separate tasks). I also get WantWriteError and ZeroReturnError occasionally but the EOF error seems to come up the most. These errors happen while using different client

urllib2 connection timed out error

北慕城南 提交于 2019-12-10 18:05:54
问题 I am trying to open a page using urllib2 but i keep getting connection timed out errors. The line which i am using is: f = urllib2.urlopen(url) exact error is: URLError: <urlopen error [Errno 110] Connection timed out> 回答1: urllib2 respects robots.txt. Many sites block the default User-Agent . Try adding a new User-Agent , by creating Request objects & using them as arguments for urlopen : import urllib2 request = urllib2.Request('http://www.example.com/') request.add_header('User-agent',

Proxy Authentication error in Urllib2 (Python 2.7)

我只是一个虾纸丫 提交于 2019-12-10 16:32:54
问题 [Windows 7 64 bit; Python 2.7] If I try to use Urllib2, I get this error Traceback (most recent call last): File "C:\Users\cYanide\Documents\Python Challenge\1\1.py", line 7, in <module> response = urllib2.urlopen('http://python.org/') File "C:\Python27\lib\urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout) File "C:\Python27\lib\urllib2.py", line 400, in open response = meth(req, response) File "C:\Python27\lib\urllib2.py", line 513, in http_response 'http', request,

Python fails Tor check using urllib2 to initiate requests

房东的猫 提交于 2019-12-10 16:00:42
问题 After reading through the other questions on StackOverflow, I got a snippet of Python code that is able to make requests through a Tor proxy: import urllib2 proxy = urllib2.ProxyHandler({'http':'127.0.0.1:8118'}) opener = urllib2.build_opener(proxy) print opener.open('https://check.torproject.org/').read() Since Tor works fine in Firefox with TorButton, I expected it to work fine in Python. Unfortunately, included in the mess of HTML: Sorry. You are not using Tor . I am not sure why this is

Python Beautifulsoup get_text() not getting all text

时光毁灭记忆、已成空白 提交于 2019-12-10 15:55:22
问题 I'm trying to get all text from a html tag using beautifulsoup get_text() method. I use Python 2.7 and Beautifulsoup 4.4.0. It works for most of the times. However, this method can only get first paragraph from a tag sometimes. I can't figure out why. Please see the following example. from bs4 import BeautifulSoup import urllib2 job_url = "http://www.indeed.com/viewjob?jk=0f5592c8191a21af" site = urllib2.urlopen(job_url).read() soup = BeautifulSoup(site, "html.parser") text = soup.find("span"

How to formally insert URL space (%20) using Python? [duplicate]

坚强是说给别人听的谎言 提交于 2019-12-10 15:35:51
问题 This question already has answers here : How to urlencode a querystring in Python? (12 answers) Closed 4 years ago . When a mutli-word search term is entered in ebay, the resultant URL looks something like (for example "demarini cf5 cf12"): http://www.ebay.com/sch/i.html?_from=R40&_sacat=0&_nkw=demarini%20cf5%20cfc12 I wish to construct this URL in Python so it can be accessed directly. So it's case of concatenating the base URL: http://www.ebay.com/sch/i.html?_from=R40&_sacat=0&_nkw= ...

Python: Urllib2 and OpenCV

老子叫甜甜 提交于 2019-12-10 14:32:49
问题 I have a program that saves an image in a local directory and then reads the image from that directory. But I dont want to save the image. I want to read it directly from the url. Here's my code: import cv2.cv as cv import urllib2 url = "http://cache2.allpostersimages.com/p/LRG/18/1847/M5G8D00Z/posters/curious-cat.jpg" filename = "my_test_image" + url[-4:] print filename opener = urllib2.build_opener() page = opener.open(url) img= page.read() abc = open(filename, "wb") abc.write(img) abc

Sending multiple values for one name urllib2

时光怂恿深爱的人放手 提交于 2019-12-10 13:27:25
问题 im trying so submit a webpage that has checkboxes and i need up to 10 of these checkboxes checked the problem is when i try to assign them to one name in a dict it only assigns the last one not all 10 so how can i do this here is the request code: forms = {"_ref_ck": ref, "type": "create", "selected_items[]": sel_itms[0], "selected_items[]": sel_itms[1], "selected_items[]": sel_itms[2], "selected_items[]": sel_itms[3], "selected_items[]": sel_itms[4], "selected_items[]": sel_itms[5],

“WindowsError: [Error 5] Access is denied” using urllib2

妖精的绣舞 提交于 2019-12-10 13:22:56
问题 I'm getting a "WindowsError: [Error 5] Access is denied" message when reading a website with urllib2. from urllib2 import urlopen, Request from bs4 import BeautifulSoup hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11'} req = Request('https://' + url, headers=hdr) soup = BeautifulSoup( urlopen( req ).read() ) The full traceback is: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:

reading a stream made by urllib2 never recovers when connection got interrupted

丶灬走出姿态 提交于 2019-12-10 12:55:02
问题 While trying to make one of my python applications a bit more robust in case of connection interruptions I discovered that calling the read function of an http-stream made by urllib2 may block the script forever. I thought that the read function will timeout and eventually raise an exception but this does not seam to be the case when the connection got interrupted during a read function call. Here is the code that will cause the problem: import urllib2 while True: try: stream = urllib2