A truly complete solution requires more work, but here's an approximation that may work well enough (note that a @ prefix is assumed and the input string is expected to start with it):
^@(([a-zA-Z](-?[a-zA-Z0-9])*)\.)+[a-zA-Z]{2,}$
You can use this with egrep (or grep -E), but also with [[ ... =~ ... ]], bash's regex-matching operator.
Makes the following assumptions, which are more permissive than actual DNS name constraints:
Only ASCII (non-foreign) letters are allowed - see below for Internationalized Domain Name (IDN) considerations; also, the Punycode *(ASCII-compatible) forms of IDNs - e.g., xn--bcher-kva.ch for bücher.ch - are not matched - see below.
There's no limit on the number of nested subdomains.
There's no limit on the length of any label (name component), and no limit on the overall length of the name (for actual limits, see here).
The TLD (last component) is composed of letters only and has a length of at least 2.
Both subdomain and domain names must start with a letter; subdomains are allowed to be single-letter.
Here's a quick test:
for d in @subdom..dom.ext @dom.ext @subdom.dom.ext @subsubdom.subdom.dom.ext @subsub-dom.sub-dom.ext @x.org; do
[[ $d =~ \
^@(([a-zA-Z](-?[a-zA-Z0-9])*)\.)+[a-zA-Z]{2,}$ \
]] && echo YES || echo NO
done
Support for Internationalized Domain Names (IDN) with literal Unicode characters - again, a complete solution requires more work:
A simple improvement to also match IDNs is to replace [a-zA-Z] with [[:alpha:]] and [a-zA-Z0-9] with [[:alnum:]] in the above regex; i.e.:
^@(([[:alpha:]](-?[[:alnum:]])*)\.)+[[:alpha:]]{2,}$
Caveats:
No attempt is made to recognize Punycode-encoded versions of IDNs, which use an ASCII-based encoding with prefix xn--, and which would require decoding afterwards.
As Patrick Mevzek points out, the above can yield both false negatives and false positives (using his examples):
- False positive: an invalid Punycode-encoded name such as
ab--whatever
- False positive: Invalid cross-language names; e.g.,
cαfe.fr, which uses a Greek letter in a French domain name - a rule that is impossible to enforce via a regex alone.
- False negatives: emoji-based names such as