Validate a hostname string

后端 未结 10 1524
生来不讨喜
生来不讨喜 2020-12-04 13:24

Following up to Regular expression to match hostname or IP Address? and using Restrictions on valid host names as a reference, what is the most readable, concise way to matc

相关标签:
10条回答
  • 2020-12-04 14:06
    import re
    def is_valid_hostname(hostname):
        if len(hostname) > 255:
            return False
        if hostname[-1] == ".":
            hostname = hostname[:-1] # strip exactly one dot from the right, if present
        allowed = re.compile("(?!-)[A-Z\d-]{1,63}(?<!-)$", re.IGNORECASE)
        return all(allowed.match(x) for x in hostname.split("."))
    

    ensures that each segment

    • contains at least one character and a maximum of 63 characters
    • consists only of allowed characters
    • doesn't begin or end with a hyphen.

    It also avoids double negatives (not disallowed), and if hostname ends in a ., that's OK, too. It will (and should) fail if hostname ends in more than one dot.

    0 讨论(0)
  • 2020-12-04 14:15

    I like the thoroughness of Tim Pietzcker's answer, but I prefer to offload some of the logic from regular expressions for readability. Honestly, I had to look up the meaning of those (? "extension notation" parts. Additionally, I feel the "double-negative" approach is more obvious in that it limits the responsibility of the regular expression to just finding any invalid character. I do like that re.IGNORECASE allows the regex to be shortened.

    So here's another shot; it's longer but it reads kind of like prose. I suppose "readable" is somewhat at odds with "concise". I believe all of the validation constraints mentioned in the thread so far are covered:

    
    def isValidHostname(hostname):
        if len(hostname) > 255:
            return False
        if hostname.endswith("."): # A single trailing dot is legal
            hostname = hostname[:-1] # strip exactly one dot from the right, if present
        disallowed = re.compile("[^A-Z\d-]", re.IGNORECASE)
        return all( # Split by labels and verify individually
            (label and len(label) <= 63 # length is within proper range
             and not label.startswith("-") and not label.endswith("-") # no bordering hyphens
             and not disallowed.search(label)) # contains only legal characters
            for label in hostname.split("."))
    
    0 讨论(0)
  • 2020-12-04 14:15
    def is_valid_host(host):
        '''IDN compatible domain validator'''
        host = host.encode('idna').lower()
        if not hasattr(is_valid_host, '_re'):
            import re
            is_valid_host._re = re.compile(r'^([0-9a-z][-\w]*[0-9a-z]\.)+[a-z0-9\-]{2,15}$')
        return bool(is_valid_host._re.match(host))
    
    0 讨论(0)
  • 2020-12-04 14:19

    If you're looking to validate the name of an existing host, the best way is to try to resolve it. You'll never write a regular expression to provide that level of validation.

    0 讨论(0)
提交回复
热议问题