Read timeout using either urllib2 or any other http library

后端 未结 8 933
花落未央
花落未央 2020-11-29 05:36

I have code for reading an url like this:

from urllib2 import Request, urlopen
req = Request(url)
for key, val in headers.items():
    req.add_header(key, va         


        
8条回答
  •  一生所求
    2020-11-29 06:00

    It's not possible for any library to do this without using some kind of asynchronous timer through threads or otherwise. The reason is that the timeout parameter used in httplib, urllib2 and other libraries sets the timeout on the underlying socket. And what this actually does is explained in the documentation.

    SO_RCVTIMEO

    Sets the timeout value that specifies the maximum amount of time an input function waits until it completes. It accepts a timeval structure with the number of seconds and microseconds specifying the limit on how long to wait for an input operation to complete. If a receive operation has blocked for this much time without receiving additional data, it shall return with a partial count or errno set to [EAGAIN] or [EWOULDBLOCK] if no data is received.

    The bolded part is key. A socket.timeout is only raised if not a single byte has been received for the duration of the timeout window. In other words, this is a timeout between received bytes.

    A simple function using threading.Timer could be as follows.

    import httplib
    import socket
    import threading
    
    def download(host, path, timeout = 10):
        content = None
        
        http = httplib.HTTPConnection(host)
        http.request('GET', path)
        response = http.getresponse()
        
        timer = threading.Timer(timeout, http.sock.shutdown, [socket.SHUT_RD])
        timer.start()
        
        try:
            content = response.read()
        except httplib.IncompleteRead:
            pass
            
        timer.cancel() # cancel on triggered Timer is safe
        http.close()
        
        return content
    
    >>> host = 'releases.ubuntu.com'
    >>> content = download(host, '/15.04/ubuntu-15.04-desktop-amd64.iso', 1)
    >>> print content is None
    True
    >>> content = download(host, '/15.04/MD5SUMS', 1)
    >>> print content is None
    False
    

    Other than checking for None, it's also possible to catch the httplib.IncompleteRead exception not inside the function, but outside of it. The latter case will not work though if the HTTP request doesn't have a Content-Length header.

提交回复
热议问题