Check domain availability — avoiding front running — using wildcards or regex

我是研究僧i 提交于 2021-02-08 11:58:27

问题


I can check for the availability of a an individual domain via whois abc123.com.

I can't figure out how to check the availability of a whole set of domains that match criteria, like XXX YYY.Z. where X is any 3 of the same letters, Y is any 3 of the same numbers, and Z is any of com, org, or io. Like aaa111.org

That's just an example case, but you get the idea - I'd like to specify strings, patterns, and endings, and see what's available.

I can do this kind of string matching with Regex, but I don't know how to apply that to a shell script.

I want to be able to input my matching criteria either via an array or a regex, and output a list of all matching domains.

whois abc.com | grep "No match" is useful here, because it is blank if that domain is not registered; maybe that could factor into the script, or something like that. it also reduces the output to a single line, rather than the mountain of garbage that whois outputs by default.

A script that works either with bash, zsh, or fish would be appreciated.

You might be wondering why bother doing this from command line when you can go to a website - the reason is that the domain you're looking for is often poached the moment you actually search for it. This is a well-known phenomenon known as domain name front running, and I had happened to me just today, hence my attempts at a local, automated solution that doesn't go through a registrar.

...

Edit in response to comment: I'm not attached to the "whois" aspect of the solution, just the ability to check via regex or pattern. -- Edit 2: "whois" turned out to be necessary to avoid false positives; answer was revised to include this aspect.


回答1:


Here is an example implementation using DNS requests and Whois only when no SOA record:

#!/usr/bin/env bash

for z in {com,org,io}; do
  for y in {0..9}; do
    for x in {a..z}; do

      # Compose domain as xxxyyy.z
      domain="$x$x$x$y$y$y.$z"

      # If domain has no SOA DNS record, chances are it is available.
      if [ -z "$(dig +keepopen +short -q "$domain" -t SOA)" ]; then

        # To be sure a domain without SOA DNS record is really available:
        # check it has no whois record either
        if ! whois "$domain" >/dev/null; then
          printf 'Domain %s is available\n' "$domain"
        else
          printf 'Domain %s has no DNS SOA but has a whois record\n' "$domain"
        fi
      else
        printf 'An SOA record exist for domain %s.\nIt may not be available.\n' "$domain"
      fi
    done
  done
done

Sample first lines of output:

Domain aaa000.com has no DNS SOA but has a whois record
An SOA record exist for domain bbb000.com.
It may not be available.
An SOA record exist for domain ccc000.com.
It may not be available.
Domain ddd000.com has no DNS SOA but has a whois record
An SOA record exist for domain eee000.com.
It may not be available.
An SOA record exist for domain fff000.com.
It may not be available.
An SOA record exist for domain ggg000.com.
It may not be available.

Please don't do this below:

I can't figure out how to check the availability of a whole set of domains that match criteria, like XXX YYY.Z. where X is any 3 letters, Y is any 3 numbers, and Z is any of com, org, or io.

The reason is: it would means testing the availability of 52728000 individual domain names, an unrealistic number of requests, even for DNS services rather than Whois services.

The arithmetic behind:

  • XXX where X is any 3 letters: 26 letters → 26×26×26=17576 combinations
  • YYY where Y is any 3 numbers: 10 numbers → 10×10×10=1000 combinations
  • Z where Z is any of com, org, or io: 3 TLDs → 3 combinations

XXXYYY.Z: 17576×1000×3 → 52728000 combinations

Lets figure this volume of domains with using loops rather than whole Bash bracket expressions to generate them, because it would not fit into memory with bracket-exp only:

#!/usr/bin/env bash

for Z in {com,org,io}; do
  for YYY in {0..9}{0..9}{0..9}; do
    for XXX in {a..z}{a..z}{a..z}; do
      printf '%s%s.%s\n' "$XXX" "$YYY" "$Z"
    done
  done
done



回答2:


There is currently no live public free service that allows you to do what you want and even if there are technical solutions for that "soon", they will probably either not be public or not be free or heavily limited.

There is at least one possible shortcut (using zonefiles), but your question is not sufficiently detailed to be sure it fits, but see below. It may work better/faster than using the DNS, depending on your use case. It has benefits and drawbacks.

I will discuss also other points to put things in perspective, and my reply is generic (applies to multiple TLDs and in multiple ways). But this won't give you a ready-made script to just use, as both this website is not a writing board and your problem with some specific constraints outlined is far too big.

I won't repeat the solution based on DNS queries as it was given already, even if the answer given can be improved (you absolutely need to contact the registry nameservers, not recursive ones!)

RDAP

A slight parenthesis first: nowadays and specifically in gTLDs, RDAP should become the new standard. It is far better than whois since it is JSON over HTTPS, so it allows you to get structured data back. It does include also the difference between lookup and query, which whois doesn't (some registry have a "domain availability check", like using finger; there was an IETF protocol for that, called IRIS D-CHK but it was at most only implemented by 2 registries, and being compressed XML over UDP it never got traction).

See RFC 7480 §4:

Clients use the GET method to retrieve a response body and use the
HEAD method to determine existence of data on the server.

Example:

$ curl --head https://rdap.verisign.com/com/v1/domain/stackoverflow.com
HTTP/1.1 200 OK
Content-Length: 2264
Content-Type: application/rdap+json
Access-Control-Allow-Origin: *
Strict-Transport-Security: max-age=15768000; includeSubDomains; preload

$ curl --head https://rdap.verisign.com/com/v1/domain/stackoverflow-but-does-not-exist.com
HTTP/1.1 404 Not Found
Content-Type: application/rdap+json
Access-Control-Allow-Origin: *
Strict-Transport-Security: max-age=15768000; includeSubDomains; preload

(if you do a GET in the first case, you will get back a JSON document you can process with jq or equivalent).

Note also that "partial search" is baked inside this new protocol, see 4.1. Partial String Searching. It is a very simple case and not a regex: you can just use a wildcard. Of course, registry RDAP servers are not mandated to implement it.

Other works are under way to have a full "regex" search capability, see Registration Data Access Protocol (RDAP) Search Using POSIX Regular Expressions and to a lesser extent Registration Data Access Protocol (RDAP) Reverse search capabilities

You can learn more about RDAP:

  • on ICANN website: https://www.icann.org/rdap
  • on some external resource: https://about.rdap.org/

So even if you apply the solution of DNS then whois, I still highly suggest that you switch to DNS then RDAP. Caveat: multiple registries and registrars RDAP servers are currently misbehaving/not respecting the specification. This will be straighten out in the future, when ICANN compliance kicks in and RDAP really starts to overshadow whois.

Registrars' API

Various registrars give you access to an API, which will include searching for available domain names and/or retrieving some domain names list (ex: dropping names, etc.). What each registrar provide, and under which constraints will of course vary so it is impossibly to reply to you there. But for any serious research that would be a first stop: go to your preferred registrar and ask it what services it can have to help you in your case.

It will obviously depend on which TLDs the registrar is accredited in: registrars accredited with a registry have a live non public channel - using a protocol called EPP - to check for domain names existence.

Whois bulk access

This exists but is in most ways almost impossible to use. For gTLDs, registrars are under contract with ICANN. If you read their contract you see this:

3.3.6 [..] Registrar shall provide third-party bulk access to the data subject to public access under Subsection 3.3.1 under the following terms and conditions:

3.3.6.1 Registrar shall make a complete electronic copy of the data available at least one (1) time per week for download by third parties who have entered into a bulk access agreement with Registrar.

3.3.6.2 Registrar may charge an annual fee, not to exceed US$10,000, for such bulk access to the data.

So, in theory, you are able to go to each registrar and ask it to provider "bulk whois access" which means more or less a complete dump of data, but:

  • as written in contract above, it can be costly (there are more than 1000 registrars, and since you can not know in advance where a domain is registered, you will need to get all of them)
  • data will not be fresh
  • as for zonefiles below, it is not a live query/reply, you will need to download all the data, store it, process it and use it.

Zonefiles (gTLDs)

Again this mostly applies to gTLDs for reasons explained just after, but see next section for other cases.

This does not allow you for live queries as you need to download the data (once per day if you want to be fresh), store it somewhere on your infrastructure, and in a format that is relevant for the queries you need to do after (an RDBMS might not be the best storage here).

But this is the "easiest" and widest solution to your problem.

Per their contract with ICANN, all gTLDs registries are mandated to give free access to their zonefiles. A zonefile will contain all published domain names under the given TLD. This is a subset of all registered names (difficult to say by how much, but in the range of single digit percentage, if even so), because you can register a domain names without nameservers (hence it is not published) or the domain can be put "on hold" for various reasons and hence disappear from the zonefile. So you will get the same amount of false negative as when using live DNS queries: you will get no data (NXDOMAIN in fact) for some domains, but in fact they are registered (and hence not available for registration again).

So all starts there: https://www.icann.org/resources/pages/czds-2014-03-03-en and the help section for users: https://czds.icann.org/help

You will need to create an account, sign a contract that outlines what you can and can not do with this data, and then you will be able to download daily zonefiles per TLD. Most, if not all gTLDs, put their zonefiles there. It may exist some doing differently, so you will need to search.

A zonefile will be in DNS "master zonefile" format. So you will see DNS records in them. You need to handle only the "NS" one, and you will see all domain names. You will need to make sure to normalize them (casing, final dot, etc.) as the content can vary from one file to another.

Once you have a daily list of domain names, you can apply any tool you want to search in them, including regular expressions. Be cautious however on the CPU and RAM constraints you can create, depending on how you store the data. The raw .com zonefile is 13GB for example.

Comparing with live DNS queries, the biggest drawback is that it is not live (data may be as much as 24 hours old) and you need to download the files before being able to do anything you want, but the biggest benefit is that you have the list of "all" domains locally, hence you can apply much more powerful tools to search in them.

Zonefiles (non gTLDs)

Outside gTLDs, that is in ccTLDs, it is rare to have full zonefiles available, because many ccTLD operators believe it is proprietary or publicly identifiable data and that no one has valid business getting this, hence it is not available.

There are however counter examples:

  • while I do not have examples in mind right now, I am pretty sure some ccTLDs can still allow zonefile access (to be determined)
  • it also happens some times that some nameservers are not correctly configured, and hence are accepting DNS AXFR replies, which means basically downloading the zonefile
  • some registries have an "open data" initiative so you might get the list of all domain names, but maybe stale as of a few months. AFNIC (.fr) is one such case: https://www.afnic.fr/en/about-afnic/news/general-news/9522/show/opendata-data-from-the-fr-tld-to-serve-innovation.html
  • some registries do publish like "new domains in the last 24hr" or things like that. If you download the list "regularly" at some point you get all the data. Again, AFNIC is one doing so: https://www.afnic.fr/en/products-and-services/services/daily-list-of-registered-domain-names/ (even if it is a picture, not a text list, but that does not stop anyone getting the real data out of it)

PS: creative use of search engines (see the site: modifier for example) can also help; of course they see only existing websites and a domain name can totally be registered but not having a website resolving on it.



来源:https://stackoverflow.com/questions/61669793/check-domain-availability-avoiding-front-running-using-wildcards-or-regex

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!