How to handle IncompleteRead: in python

自作多情 提交于 2019-11-27 19:44:20
Kyle

The link you included in your question is simply a wrapper that executes urllib's read() function, which catches any incomplete read exceptions for you. If you don't want to implement this entire patch, you could always just throw in a try/catch loop where you read your links. For example:

try:
    page = urllib2.urlopen(urls).read()
except httplib.IncompleteRead, e:
    page = e.partial

for python3

try:
    page = request.urlopen(urls).read()
except (http.client.IncompleteRead) as e:
    page = e.partial

I find out in my case : send HTTP/1.0 request , adding this , fix the problem.

import httplib
httplib.HTTPConnection._http_vsn = 10
httplib.HTTPConnection._http_vsn_str = 'HTTP/1.0'

after I do the request :

req = urllib2.Request(url, post, headers)
filedescriptor = urllib2.urlopen(req)
img = filedescriptor.read()

after I back to http 1.1 with (for connections that support 1.1) :

httplib.HTTPConnection._http_vsn = 11
httplib.HTTPConnection._http_vsn_str = 'HTTP/1.1'

the trick is use http 1.0 instead the default http/1.1 http 1.1 could handle chunks but for some reason webserver don't , so we do the request in http 1.0

What worked for me is catching IncompleteRead as an exception and harvesting the data you managed to read in each iteration by putting this into a loop like below: (Note, I am using Python 3.4.1 and the urllib library has changed between 2.7 and 3.4)

try:
    requestObj = urllib.request.urlopen(url, data)
    responseJSON=""
    while True:
        try:
            responseJSONpart = requestObj.read()
        except http.client.IncompleteRead as icread:
            responseJSON = responseJSON + icread.partial.decode('utf-8')
            continue
        else:
            responseJSON = responseJSON + responseJSONpart.decode('utf-8')
            break

    return json.loads(responseJSON)

except Exception as RESTex:
    print("Exception occurred making REST call: " + RESTex.__str__())

I found that my virus detector/firewall was causing this problem. "Online Shield" part of AVG.

You can use requests instead of urllib2. requests is based on urllib3 so it rarely have any problem. Put it in a loop to try it 3 times, and it will be much stronger. You can use it this way:

import requests      

msg = None   
for i in [1,2,3]:        
    try:  
        r = requests.get(self.crawling, timeout=30)
        msg = r.text
        if msg: break
    except Exception as e:
        sys.stderr.write('Got error when requesting URL "' + self.crawling + '": ' + str(e) + '\n')
        if i == 3 :
            sys.stderr.write('{0.filename}@{0.lineno}: Failed requesting from URL "{1}" ==> {2}\n'.                       format(inspect.getframeinfo(inspect.currentframe()), self.crawling, e))
            raise e
        time.sleep(10*(i-1))

I tried all these solutions and none of them worked for me. Actually, what did work is instead of using urllib, I just used http.client (Python 3)

conn = http.client.HTTPConnection('www.google.com')
conn.request('GET', '/')
r1 = conn.getresponse()
page = r1.read().decode('utf-8')

This works perfectly every time, whereas with urllib it was returning an incompleteread exception every time.

I just add a more exception to pass this problem.
just like

try:
    r = requests.get(url, timeout=timeout)

except (requests.exceptions.ChunkedEncodingError, requests.ConnectionError) as e:
    logging.error("There is a error: %s" % e)
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!