How can I unshorten a URL?

后端 未结 9 1075
时光说笑
时光说笑 2020-11-30 05:19

I want to be able to take a shortened or non-shortened URL and return its un-shortened form. How can I make a python program to do this?

Additional Clarification:

9条回答
  •  伪装坚强ぢ
    2020-11-30 05:27

    Here a src code that takes into account almost of the useful corner cases:

    • set a custom Timeout.
    • set a custom User Agent.
    • check whether we have to use an http or https connection.
    • resolve recursively the input url and prevent ending within a loop.

    The src code is on github @ https://github.com/amirkrifa/UnShortenUrl

    comments are welcome ...

    import logging
    logging.basicConfig(level=logging.DEBUG)
    
    TIMEOUT = 10
    class UnShortenUrl:
        def process(self, url, previous_url=None):
            logging.info('Init url: %s'%url)
            import urlparse
            import httplib
            try:
                parsed = urlparse.urlparse(url)
                if parsed.scheme == 'https':
                    h = httplib.HTTPSConnection(parsed.netloc, timeout=TIMEOUT)
                else:
                    h = httplib.HTTPConnection(parsed.netloc, timeout=TIMEOUT)
                resource = parsed.path
                if parsed.query != "": 
                    resource += "?" + parsed.query
                try:
                    h.request('HEAD', 
                              resource, 
                              headers={'User-Agent': 'curl/7.38.0'}
                                       }
                              )
                    response = h.getresponse()
                except:
                    import traceback
                    traceback.print_exec()
                    return url
    
                logging.info('Response status: %d'%response.status)
                if response.status/100 == 3 and response.getheader('Location'):
                    red_url = response.getheader('Location')
                    logging.info('Red, previous: %s, %s'%(red_url, previous_url))
                    if red_url == previous_url:
                        return red_url
                    return self.process(red_url, previous_url=url) 
                else:
                    return url 
            except:
                import traceback
                traceback.print_exc()
                return None
    

提交回复
热议问题