Check for a valid domain name in a string?

后端 未结 5 970
悲哀的现实
悲哀的现实 2020-12-10 04:39

I am using python and would like a simple api or regex to check for a domain name\'s validity. By validity I am the syntactical validity and not whether the domain name actu

相关标签:
5条回答
  • 2020-12-10 05:20

    I've been using this:

    (r'(\.|\/)(([A-Za-z\d]+|[A-Za-z\d][-])+[A-Za-z\d]+){1,63}\.([A-Za-z]{2,3}\.[A-Za-z]{2}|[A-Za-z]{2,6})')
    

    to ensure it follows either after dot (www.) or / (http://) and the dash occurs only inside the name and to match suffixes such as gov.uk too.

    0 讨论(0)
  • 2020-12-10 05:23
    r'^(?=.{4,255}$)([a-zA-Z0-9][a-zA-Z0-9-]{,61}[a-zA-Z0-9]\.)+[a-zA-Z0-9]{2,5}$'
    
    • Lookahead makes sure that it has a minimum of 4 (a.in) and a maximum of 255 characters
    • One or more labels (separated by periods) of length between 1 to 63, starting and ending with alphanumeric characters, and containing alphanumeric chars and hyphens in the middle.
    • Followed by a top level domain name (whose max length is 5 for museum)
    0 讨论(0)
  • 2020-12-10 05:26

    Any domain name is (syntactically) valid if it's a dot-separated list of identifiers, each no longer than 63 characters, and made up of letters, digits and dashes (no underscores).

    So:

    r'[a-zA-Z\d-]{,63}(\.[a-zA-Z\d-]{,63})*'
    

    would be a start. Of course, these days some non-Ascii characters may be allowed (a very recent development) which changes the parameters a lot -- do you need to deal with that?

    0 讨论(0)
  • 2020-12-10 05:33

    Note that while you can do something with regular expressions, the most reliable way to test for valid domain names is to actually try to resolve the name (with socket.getaddrinfo):

    from socket import getaddrinfo
    
    result = getaddrinfo("www.google.com", None)
    print result[0][4]
    

    Note that technically this can leave you open to DoS (if someone submits thousands of invalid domain names, it can take a while to resolve invalid names) but you could simply rate-limit someone who tries this.

    The advantage of this is that it'll catch "hotmail.con" as invalid (instead of "hotmail.com", say) whereas a regex would say "hotmail.con" is valid.

    0 讨论(0)
  • 2020-12-10 05:34

    The answers are all pretty outdated with the spec at this point. I believe the below will match the current spec correctly:

    r'^(?=.{1,253}$)(?!.*\.\..*)(?!\..*)([a-zA-Z0-9-]{,63}\.){,127}[a-zA-Z0-9-]{1,63}$'
    
    0 讨论(0)
提交回复
热议问题