Getting parts of a URL (Regex)

后端 未结 26 2543
说谎
说谎 2020-11-22 02:13

Given the URL (single line):
http://test.example.com/dir/subdir/file.html

How can I extract the following parts using regular expressions:

  1. The Subd
26条回答
  •  独厮守ぢ
    2020-11-22 03:09

    Propose a much more readable solution (in Python, but applies to any regex):

    def url_path_to_dict(path):
        pattern = (r'^'
                   r'((?P.+?)://)?'
                   r'((?P.+?)(:(?P.*?))?@)?'
                   r'(?P.*?)'
                   r'(:(?P\d+?))?'
                   r'(?P/.*?)?'
                   r'(?P[?].*?)?'
                   r'$'
                   )
        regex = re.compile(pattern)
        m = regex.match(path)
        d = m.groupdict() if m is not None else None
    
        return d
    
    def main():
        print url_path_to_dict('http://example.example.com/example/example/example.html')
    

    Prints:

    {
    'host': 'example.example.com', 
    'user': None, 
    'path': '/example/example/example.html', 
    'query': None, 
    'password': None, 
    'port': None, 
    'schema': 'http'
    }
    

提交回复
热议问题