Validate a hostname string

匿名 (未验证) 提交于 2019-12-03 02:13:02

问题:

Following up to Regular expression to match hostname or IP Address? and using Restrictions on valid host names as a reference, what is the most readable, concise way to match/validate a hostname/fqdn (fully qualified domain name) in Python? I've answered with my attempt below, improvements welcome.

回答1:

import re def is_valid_hostname(hostname):     if len(hostname) > 255:         return False     if hostname[-1] == ".":         hostname = hostname[:-1] # strip exactly one dot from the right, if present     allowed = re.compile("(?!-)[A-Z\d-]{1,63}(?<!-)$", re.IGNORECASE)     return all(allowed.match(x) for x in hostname.split(".")) 

ensures that each segment

  • contains at least one character and a maximum of 63 characters
  • consists only of allowed characters
  • doesn't begin or end with a hyphen.

It also avoids double negatives (not disallowed), and if hostname ends in a ., that's OK, too. It will (and should) fail if hostname ends in more than one dot.



回答2:

Per The Old New Thing, the maximum length of a DNS name is 253 characters. (One is allowed up to 255 octets, but 2 of those are consumed by the encoding.)

import re  def validate_fqdn(dn):     if dn.endswith('.'):         dn = dn[:-1]     if len(dn) < 1 or len(dn) > 253:         return False     ldh_re = re.compile('^[a-z0-9]([a-z0-9-]{0,61}[a-z0-9])?$',                         re.IGNORECASE)     return all(ldh_re.match(x) for x in dn.split('.')) 

One could argue for accepting empty domain names, or not, depending on one's purpose.



回答3:

Here's a bit stricter version of Tim Pietzcker's answer with the following improvements:

  • Limit the length of the hostname to 253 characters (after stripping the optional trailing dot).
  • Limit the character set to ASCII (i.e. use [0-9] instead of \d).
  • Check that the TLD is not all-numeric.
import re  def is_valid_hostname(hostname):     if hostname[-1] == ".":         # strip exactly one dot from the right, if present         hostname = hostname[:-1]     if len(hostname) > 253:         return False      labels = hostname.split(".")      # the TLD must be not all-numeric     if re.match(r"[0-9]+$", labels[-1]):         return False      allowed = re.compile(r"(?!-)[a-z0-9-]{1,63}(?<!-)$", re.IGNORECASE)     return all(allowed.match(label) for label in labels) 


回答4:

I like the thoroughness of Tim Pietzcker's answer, but I prefer to offload some of the logic from regular expressions for readability. Honestly, I had to look up the meaning of those (? "extension notation" parts. Additionally, I feel the "double-negative" approach is more obvious in that it limits the responsibility of the regular expression to just finding any invalid character. I do like that re.IGNORECASE allows the regex to be shortened.

So here's another shot; it's longer but it reads kind of like prose. I suppose "readable" is somewhat at odds with "concise". I believe all of the validation constraints mentioned in the thread so far are covered:

 def isValidHostname(hostname):     if len(hostname) > 255:         return False     if hostname.endswith("."): # A single trailing dot is legal         hostname = hostname[:-1] # strip exactly one dot from the right, if present     disallowed = re.compile("[^A-Z\d-]", re.IGNORECASE)     return all( # Split by labels and verify individually         (label and len(label) <= 63 # length is within proper range          and not label.startswith("-") and not label.endswith("-") # no bordering hyphens          and not disallowed.search(label)) # contains only legal characters         for label in hostname.split(".")) 


回答5:

def is_valid_host(host):     '''IDN compatible domain validator'''     host = host.encode('idna').lower()     if not hasattr(is_valid_host, '_re'):         import re         is_valid_host._re = re.compile(r'^([0-9a-z][-\w]*[0-9a-z]\.)+[a-z0-9\-]{2,15}$')     return bool(is_valid_host._re.match(host)) 


回答6:

Complimentary to the @TimPietzcker answer. Underscore is valid hostname, doubel dash is common for IDN punycode. Port number should be stripped. This is the cleanup of the code.

import re def is_valid_hostname(hostname):     if len(hostname) > 255:         return False     hostname = hostname.rstrip(".")     allowed = re.compile("(?!-)[A-Z\d\-\_]{1,63}(?<!-)$", re.IGNORECASE)     return all(allowed.match(x) for x in hostname.split("."))  # convert your unicode hostname to punycode (python 3 )  # Remove the port number from hostname normalise_host = hostname.encode("idna").decode().split(":")[0] is_valid_hostanme(normalise_host ) 


回答7:

Process each DNS label individually by excluding invalid characters and ensuring nonzero length.

 def isValidHostname(hostname):     disallowed = re.compile("[^a-zA-Z\d\-]")     return all(map(lambda x: len(x) and not disallowed.search(x), hostname.split("."))) 


回答8:

If you're looking to validate the name of an existing host, the best way is to try to resolve it. You'll never write a regular expression to provide that level of validation.



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!