Following up to Regular expression to match hostname or IP Address? and using Restrictions on valid host names as a reference, what is the most readable, concise way to matc
Per The Old New Thing, the maximum length of a DNS name is 253 characters. (One is allowed up to 255 octets, but 2 of those are consumed by the encoding.)
import re
def validate_fqdn(dn):
if dn.endswith('.'):
dn = dn[:-1]
if len(dn) < 1 or len(dn) > 253:
return False
ldh_re = re.compile('^[a-z0-9]([a-z0-9-]{0,61}[a-z0-9])?$',
re.IGNORECASE)
return all(ldh_re.match(x) for x in dn.split('.'))
One could argue for accepting empty domain names, or not, depending on one's purpose.
Complimentary to the @TimPietzcker answer. Underscore is a valid hostname character (but not for domain name) . While double dash is commonly found for IDN punycode domain(e.g. xn--). Port number should be stripped. This is the cleanup of the code.
import re
def is_valid_hostname(hostname):
if len(hostname) > 255:
return False
hostname = hostname.rstrip(".")
allowed = re.compile("(?!-)[A-Z\d\-\_]{1,63}(?<!-)$", re.IGNORECASE)
return all(allowed.match(x) for x in hostname.split("."))
# convert your unicode hostname to punycode (python 3 )
# Remove the port number from hostname
normalise_host = hostname.encode("idna").decode().split(":")[0]
is_valid_hostname(normalise_host )
Don't reinvent the wheel. You can use a library, e.g. validators. Or you can copy their code:
pip install validators
import validators
if validators.domain('example.com')
print('this domain is valid')
Process each DNS label individually by excluding invalid characters and ensuring nonzero length.
def isValidHostname(hostname):
disallowed = re.compile("[^a-zA-Z\d\-]")
return all(map(lambda x: len(x) and not disallowed.search(x), hostname.split(".")))
Here's a bit stricter version of Tim Pietzcker's answer with the following improvements:
[0-9]
instead of \d
).import re
def is_valid_hostname(hostname):
if hostname[-1] == ".":
# strip exactly one dot from the right, if present
hostname = hostname[:-1]
if len(hostname) > 253:
return False
labels = hostname.split(".")
# the TLD must be not all-numeric
if re.match(r"[0-9]+$", labels[-1]):
return False
allowed = re.compile(r"(?!-)[a-z0-9-]{1,63}(?<!-)$", re.IGNORECASE)
return all(allowed.match(label) for label in labels)
I think this regex might help in Python: '^([a-zA-Z0-9]+(\.|\-))*[a-zA-Z0-9]+$'