Python seek on remote file using HTTP

前端 未结 4 1674
遥遥无期
遥遥无期 2020-12-05 08:41

How do I seek to a particular position on a remote (HTTP) file so I can download only that part?

Lets say the bytes on a remote file were: 1234567890

I wanna

4条回答
  •  情深已故
    2020-12-05 08:57

    If you are downloading the remote file through HTTP, you need to set the Range header.

    Check in this example how it can be done. Looks like this:

    myUrlclass.addheader("Range","bytes=%s-" % (existSize))
    

    EDIT: I just found a better implementation. This class is very simple to use, as it can be seen in the docstring.

    class HTTPRangeHandler(urllib2.BaseHandler):
    """Handler that enables HTTP Range headers.
    
    This was extremely simple. The Range header is a HTTP feature to
    begin with so all this class does is tell urllib2 that the 
    "206 Partial Content" reponse from the HTTP server is what we 
    expected.
    
    Example:
        import urllib2
        import byterange
    
        range_handler = range.HTTPRangeHandler()
        opener = urllib2.build_opener(range_handler)
    
        # install it
        urllib2.install_opener(opener)
    
        # create Request and set Range header
        req = urllib2.Request('http://www.python.org/')
        req.header['Range'] = 'bytes=30-50'
        f = urllib2.urlopen(req)
    """
    
    def http_error_206(self, req, fp, code, msg, hdrs):
        # 206 Partial Content Response
        r = urllib.addinfourl(fp, hdrs, req.get_full_url())
        r.code = code
        r.msg = msg
        return r
    
    def http_error_416(self, req, fp, code, msg, hdrs):
        # HTTP's Range Not Satisfiable error
        raise RangeError('Requested Range Not Satisfiable')
    

    Update: The "better implementation" has moved to github: excid3/urlgrabber in the byterange.py file.

提交回复
热议问题