Removing HTTP and WWW from URL python

后端 未结 3 1509
半阙折子戏
半阙折子戏 2021-01-11 23:22
url1=\'www.google.com\'
url2=\'http://www.google.com\'
url3=\'http://google.com\'
url4=\'www.google\'
url5=\'http://www.google.com/images\'
url6=\'https://www.youtub         


        
3条回答
  •  耶瑟儿~
    2021-01-11 23:48

    Could use regex, depending on how strict your data is. Are http and www always going to be there? Have you thought about https or w3 sites?

    import re
    new_url = re.sub('.*w\.', '', url, 1)
    

    1 to not harm websites ending with a w.

    edit after clarification

    I'd do two steps:

    if url.startswith('http'):
        url = re.sub(r'https?:\\', '', url)
    if url.startswith('www.'):
        url = re.sub(r'www.', '', url)
    

提交回复
热议问题