Validate a hostname string

后端 未结 10 1523
生来不讨喜
生来不讨喜 2020-12-04 13:24

Following up to Regular expression to match hostname or IP Address? and using Restrictions on valid host names as a reference, what is the most readable, concise way to matc

相关标签:
10条回答
  • 2020-12-04 13:55

    Per The Old New Thing, the maximum length of a DNS name is 253 characters. (One is allowed up to 255 octets, but 2 of those are consumed by the encoding.)

    import re
    
    def validate_fqdn(dn):
        if dn.endswith('.'):
            dn = dn[:-1]
        if len(dn) < 1 or len(dn) > 253:
            return False
        ldh_re = re.compile('^[a-z0-9]([a-z0-9-]{0,61}[a-z0-9])?$',
                            re.IGNORECASE)
        return all(ldh_re.match(x) for x in dn.split('.'))
    

    One could argue for accepting empty domain names, or not, depending on one's purpose.

    0 讨论(0)
  • 2020-12-04 13:55

    Complimentary to the @TimPietzcker answer. Underscore is a valid hostname character (but not for domain name) . While double dash is commonly found for IDN punycode domain(e.g. xn--). Port number should be stripped. This is the cleanup of the code.

    import re
    def is_valid_hostname(hostname):
        if len(hostname) > 255:
            return False
        hostname = hostname.rstrip(".")
        allowed = re.compile("(?!-)[A-Z\d\-\_]{1,63}(?<!-)$", re.IGNORECASE)
        return all(allowed.match(x) for x in hostname.split("."))
    
    # convert your unicode hostname to punycode (python 3 ) 
    # Remove the port number from hostname
    normalise_host = hostname.encode("idna").decode().split(":")[0]
    is_valid_hostname(normalise_host )
    
    0 讨论(0)
  • 2020-12-04 13:59

    Don't reinvent the wheel. You can use a library, e.g. validators. Or you can copy their code:

    Installation

    pip install validators
    

    Usage

    import validators
    if validators.domain('example.com')
        print('this domain is valid')
    
    0 讨论(0)
  • 2020-12-04 14:00

    Process each DNS label individually by excluding invalid characters and ensuring nonzero length.

    
    def isValidHostname(hostname):
        disallowed = re.compile("[^a-zA-Z\d\-]")
        return all(map(lambda x: len(x) and not disallowed.search(x), hostname.split(".")))
    
    0 讨论(0)
  • 2020-12-04 14:03

    Here's a bit stricter version of Tim Pietzcker's answer with the following improvements:

    • Limit the length of the hostname to 253 characters (after stripping the optional trailing dot).
    • Limit the character set to ASCII (i.e. use [0-9] instead of \d).
    • Check that the TLD is not all-numeric.
    import re
    
    def is_valid_hostname(hostname):
        if hostname[-1] == ".":
            # strip exactly one dot from the right, if present
            hostname = hostname[:-1]
        if len(hostname) > 253:
            return False
    
        labels = hostname.split(".")
    
        # the TLD must be not all-numeric
        if re.match(r"[0-9]+$", labels[-1]):
            return False
    
        allowed = re.compile(r"(?!-)[a-z0-9-]{1,63}(?<!-)$", re.IGNORECASE)
        return all(allowed.match(label) for label in labels)
    
    0 讨论(0)
  • 2020-12-04 14:05

    I think this regex might help in Python: '^([a-zA-Z0-9]+(\.|\-))*[a-zA-Z0-9]+$'

    0 讨论(0)
提交回复
热议问题