Following up to Regular expression to match hostname or IP Address? and using Restrictions on valid host names as a reference, what is the most readable, concise way to match/validate a hostname/fqdn (fully qualified domain name) in Python? I've answered with my attempt below, improvements welcome.
问题:
回答1:
import re def is_valid_hostname(hostname): if len(hostname) > 255: return False if hostname[-1] == ".": hostname = hostname[:-1] # strip exactly one dot from the right, if present allowed = re.compile("(?!-)[A-Z\d-]{1,63}(?<!-)$", re.IGNORECASE) return all(allowed.match(x) for x in hostname.split("."))
ensures that each segment
- contains at least one character and a maximum of 63 characters
- consists only of allowed characters
- doesn't begin or end with a hyphen.
It also avoids double negatives (not disallowed
), and if hostname
ends in a .
, that's OK, too. It will (and should) fail if hostname
ends in more than one dot.
回答2:
Per The Old New Thing, the maximum length of a DNS name is 253 characters. (One is allowed up to 255 octets, but 2 of those are consumed by the encoding.)
import re def validate_fqdn(dn): if dn.endswith('.'): dn = dn[:-1] if len(dn) < 1 or len(dn) > 253: return False ldh_re = re.compile('^[a-z0-9]([a-z0-9-]{0,61}[a-z0-9])?$', re.IGNORECASE) return all(ldh_re.match(x) for x in dn.split('.'))
One could argue for accepting empty domain names, or not, depending on one's purpose.
回答3:
Here's a bit stricter version of Tim Pietzcker's answer with the following improvements:
- Limit the length of the hostname to 253 characters (after stripping the optional trailing dot).
- Limit the character set to ASCII (i.e. use
[0-9]
instead of\d
). - Check that the TLD is not all-numeric.
import re def is_valid_hostname(hostname): if hostname[-1] == ".": # strip exactly one dot from the right, if present hostname = hostname[:-1] if len(hostname) > 253: return False labels = hostname.split(".") # the TLD must be not all-numeric if re.match(r"[0-9]+$", labels[-1]): return False allowed = re.compile(r"(?!-)[a-z0-9-]{1,63}(?<!-)$", re.IGNORECASE) return all(allowed.match(label) for label in labels)
回答4:
I like the thoroughness of Tim Pietzcker's answer, but I prefer to offload some of the logic from regular expressions for readability. Honestly, I had to look up the meaning of those (?
"extension notation" parts. Additionally, I feel the "double-negative" approach is more obvious in that it limits the responsibility of the regular expression to just finding any invalid character. I do like that re.IGNORECASE allows the regex to be shortened.
So here's another shot; it's longer but it reads kind of like prose. I suppose "readable" is somewhat at odds with "concise". I believe all of the validation constraints mentioned in the thread so far are covered:
def isValidHostname(hostname): if len(hostname) > 255: return False if hostname.endswith("."): # A single trailing dot is legal hostname = hostname[:-1] # strip exactly one dot from the right, if present disallowed = re.compile("[^A-Z\d-]", re.IGNORECASE) return all( # Split by labels and verify individually (label and len(label) <= 63 # length is within proper range and not label.startswith("-") and not label.endswith("-") # no bordering hyphens and not disallowed.search(label)) # contains only legal characters for label in hostname.split("."))
回答5:
def is_valid_host(host): '''IDN compatible domain validator''' host = host.encode('idna').lower() if not hasattr(is_valid_host, '_re'): import re is_valid_host._re = re.compile(r'^([0-9a-z][-\w]*[0-9a-z]\.)+[a-z0-9\-]{2,15}$') return bool(is_valid_host._re.match(host))
回答6:
Complimentary to the @TimPietzcker answer. Underscore is valid hostname, doubel dash is common for IDN punycode. Port number should be stripped. This is the cleanup of the code.
import re def is_valid_hostname(hostname): if len(hostname) > 255: return False hostname = hostname.rstrip(".") allowed = re.compile("(?!-)[A-Z\d\-\_]{1,63}(?<!-)$", re.IGNORECASE) return all(allowed.match(x) for x in hostname.split(".")) # convert your unicode hostname to punycode (python 3 ) # Remove the port number from hostname normalise_host = hostname.encode("idna").decode().split(":")[0] is_valid_hostanme(normalise_host )
回答7:
Process each DNS label individually by excluding invalid characters and ensuring nonzero length.
def isValidHostname(hostname): disallowed = re.compile("[^a-zA-Z\d\-]") return all(map(lambda x: len(x) and not disallowed.search(x), hostname.split(".")))
回答8:
If you're looking to validate the name of an existing host, the best way is to try to resolve it. You'll never write a regular expression to provide that level of validation.