Removing HTTP and WWW from URL python

后端 未结 3 1507
半阙折子戏
半阙折子戏 2021-01-11 23:22
url1=\'www.google.com\'
url2=\'http://www.google.com\'
url3=\'http://google.com\'
url4=\'www.google\'
url5=\'http://www.google.com/images\'
url6=\'https://www.youtub         


        
3条回答
  •  深忆病人
    2021-01-11 23:54

    A more elegant solution would be using urlparse:

    from urllib.parse import urlparse
    
    def get_hostname(url, uri_type='both'):
        """Get the host name from the url"""
        parsed_uri = urlparse(url)
        if uri_type == 'both':
            return '{uri.scheme}://{uri.netloc}/'.format(uri=parsed_uri)
        elif uri_type == 'netloc_only':
            return '{uri.netloc}'.format(uri=parsed_uri)
    

    The first option includes https or http, depending on the link, and the second part netloc includes what you were looking for.

提交回复
热议问题