How to catch 404 error in urllib.urlretrieve

后端 未结 3 1336
长情又很酷
长情又很酷 2020-12-05 07:17

Background: I am using urllib.urlretrieve, as opposed to any other function in the urllib* modules, because of the hook function support (see reporthook

相关标签:
3条回答
  • 2020-12-05 07:31

    You should use:

    import urllib2
    
    try:
        resp = urllib2.urlopen("http://www.google.com/this-gives-a-404/")
    except urllib2.URLError, e:
        if not hasattr(e, "code"):
            raise
        resp = e
    
    print "Gave", resp.code, resp.msg
    print "=" * 80
    print resp.read(80)
    

    Edit: The rationale here is that unless you expect the exceptional state, it is an exception for it to happen, and you probably didn't even think about it -- so instead of letting your code continue to run while it was unsuccessful, the default behavior is--quite sensibly--to inhibit its execution.

    0 讨论(0)
  • 2020-12-05 07:38

    The URL Opener object's "retreive" method supports the reporthook and throws an exception on 404.

    http://docs.python.org/library/urllib.html#url-opener-objects

    0 讨论(0)
  • 2020-12-05 07:40

    Check out urllib.urlretrieve's complete code:

    def urlretrieve(url, filename=None, reporthook=None, data=None):
      global _urlopener
      if not _urlopener:
        _urlopener = FancyURLopener()
      return _urlopener.retrieve(url, filename, reporthook, data)
    

    In other words, you can use urllib.FancyURLopener (it's part of the public urllib API). You can override http_error_default to detect 404s:

    class MyURLopener(urllib.FancyURLopener):
      def http_error_default(self, url, fp, errcode, errmsg, headers):
        # handle errors the way you'd like to
    
    fn, h = MyURLopener().retrieve(url, reporthook=my_report_hook)
    
    0 讨论(0)
提交回复
热议问题