How do I extract the domain name from a url using bash? like: http://example.com/ to example.com must work for any tld, not just .com
Instead of using regex to do this you can use python's urlparse:
URL=http://www.example.com
python -c "from urlparse import urlparse
url = urlparse('$URL')
print url.netloc"
You could either use it like this or put it in a small script. However this still expects a valid scheme identifier, looking at your comment your input doesn't necessarily provide one. You can specify a default scheme, but urlparse expects the netloc to start with '//'
:
url = urlparse('//www.example.com/index.html','http')
So you will have to prepend those manually, i.e:
python -c "from urlparse import urlparse
if '$URL'.find('://') == -1 then:
url = urlparse('//$URL','http')
else:
url = urlparse('$URL')
print url.netloc"
#!/usr/bin/perl -w
use strict;
my $url = $ARGV[0];
if($url =~ /([^:]*:\/\/)?([^\/]+\.[^\/]+)/g) {
print $2;
}
Usage:
./test.pl 'https://example.com'
example.com
./test.pl 'https://www.example.com/'
www.example.com
./test.pl 'example.org/'
example.org
./test.pl 'example.org'
example.org
./test.pl 'example' -> no output
And if you just want the domain and not the full host + domain use this instead:
#!/usr/bin/perl -w
use strict;
my $url = $ARGV[0];
if($url =~ /([^:]*:\/\/)?([^\/]*\.)*([^\/\.]+\.[^\/]+)/g) {
print $3;
}
basename "http://example.com"
Now of course, this won't work with a URI like this: http://www.example.com/index.html
but you could do the following:
basename $(dirname "http://www.example.com/index.html")
Or for more complex URIs:
echo "http://www.example.com/somedir/someotherdir/index.html" | cut -d'/' -f3
-d means "delimiter" and -f means "field"; in the above example, the third field delimited by the forward slash '/' is www.example.com.
$ URI="http://user:pw@example.com:80/"
$ echo $URI | sed -e 's/[^/]*\/\/\([^@]*@\)\?\([^:/]*\).*/\2/'
example.com
see http://en.wikipedia.org/wiki/URI_scheme
The following will output "example.com":
URI="http://user@example.com/foo/bar/baz/?lala=foo"
ruby -ruri -e "p URI.parse('$URI').host"
For more info on what you can do with Ruby's URI class you'd have to consult the docs.
echo $URL | cut -d'/' -f3 | cut -d':' -f1
Works for URLs:
http://host.example.com
http://host.example.com/hi/there
http://host.example.com:2345/hi/there
http://host.example.com:2345