How to remove scheme from url in Python?

后端 未结 3 2068
我在风中等你
我在风中等你 2021-01-12 16:39

I am working with an application that returns urls, written with Flask. I want the URL displayed to the user to be as clean as possible so I want t

相关标签:
3条回答
  • 2021-01-12 17:11

    I've seen this done in Flask libraries and extensions. Worth noting you can do it although it does make use of a protected member (._replace) of the ParseResult/SplitResult.

    url = 'HtTp://stackoverflow.com/questions/tagged/python?page=2'
    split_url = urlsplit(url) 
    # >>> SplitResult(scheme='http', netloc='stackoverflow.com', path='/questions/tagged/python', query='page=2', fragment='')
    split_url_without_scheme = split_url._replace(scheme="")
    # >>> SplitResult(scheme='', netloc='stackoverflow.com', path='/questions/tagged/python', query='page=2', fragment='')
    new_url = urlunsplit(split_url_without_scheme)
    
    0 讨论(0)
  • 2021-01-12 17:18

    If you are using these programmatically rather than using a replace, I suggest having urlparse recreate the url without a scheme.

    The ParseResult object is a tuple. So you can create another removing the fields you don't want.

    # py2/3 compatibility
    try:
        from urllib.parse import urlparse, ParseResult
    except ImportError:
        from urlparse import urlparse, ParseResult
    
    
    def strip_scheme(url):
        parsed_result = urlparse(url)
        return ParseResult('', *parsed_result[1:]).geturl()
    

    You can remove any component of the parsedresult by simply replacing the input with an empty string.

    It's important to note there is a functional difference between this answer and @Lukas Graf's answer. The most likely functional difference is that the '//' component of a url isn't technically the scheme, so this answer will preserve it, whereas it will remain here.

    >>> Lukas_strip_scheme('https://yoman/hi?whatup')
    'yoman/hi?whatup'
    >>> strip_scheme('https://yoman/hi?whatup')
    '//yoman/hi?whatup'
    
    0 讨论(0)
  • 2021-01-12 17:22

    I don't think urlparse offers a single method or function for this. This is how I'd do it:

    from urlparse import urlparse
    
    url = 'HtTp://stackoverflow.com/questions/tagged/python?page=2'
    
    def strip_scheme(url):
        parsed = urlparse(url)
        scheme = "%s://" % parsed.scheme
        return parsed.geturl().replace(scheme, '', 1)
    
    print strip_scheme(url)
    

    Output:

    stackoverflow.com/questions/tagged/python?page=2
    

    If you'd use (only) simple string parsing, you'd have to deal with http[s], and possibly other schemes yourself. Also, this handles weird casing of the scheme.

    0 讨论(0)
提交回复
热议问题