How do I extract the domain name from a url using bash? like: http://example.com/ to example.com must work for any tld, not just .com
You can use simple AWK way to extract the domain name as follows:
echo http://example.com/index.php | awk -F[/:] '{print $4}'
OUTPUT: example.com
:-)
One solution that would cover for more cases would be based on sed regexps:
echo http://example.com/index.php | sed -e 's#^https://\|^http://##' -e 's#:.*##' -e 's#/.*##'
That would work for URLs like:
http://example.com/index.php, http://example.com:4040/index.php, https://example.com/index.php
Here's the node.js way, it works with or without ports and deep paths:
//get-hostname.js
'use strict';
const url = require('url');
const parts = url.parse(process.argv[2]);
console.log(parts.hostname);
Can be called like:
node get-hostname.js http://foo.example.com:8080/test/1/2/3.html
//foo.example.com
Docs: https://nodejs.org/api/url.html
sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_'
e.g.
$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'http://example.com'
example.com
$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'https://example.com'
example.com
$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'http://example.com:1234/some/path'
example.com
$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'http://user:pass@example.com:1234/some/path'
example.com
$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'http://user:pass@example.com:1234/some/path#fragment'
example.com
$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'http://user:pass@example.com:1234/some/path#fragment?params=true'
example.com
there is so little info on how you get those urls...please show more info next time. are there parameters in the url etc etc... Meanwhile, just simple string manipulation for your sample url
eg
$ s="http://example.com/index.php"
$ echo ${s/%/*} #get rid of last "/" onwards
http://example.com
$ s=${s/%\//}
$ echo ${s/#http:\/\//} # get rid of http://
example.com
other ways, using sed(GNU)
$ echo $s | sed 's/http:\/\///;s|\/.*||'
example.com
use awk
$ echo $s| awk '{gsub("http://|/.*","")}1'
example.com
With Ruby you can use the Domainatrix library / gem
http://www.pauldix.net/2009/12/parse-domains-from-urls-easily-with-domainatrix.html
require 'rubygems' require 'domainatrix' s = 'http://www.champa.kku.ac.th/dir1/dir2/file?option1&option2' url = Domainatrix.parse(s) url.domain => "kku"
great tool! :-)