I\'m looping over a series of URLs and want to clean them up. I have the following code:
# Parse url to remove http, path and check format
o_url = URI.parse(
Something like:
def remove_subdomain(host)
# Not complete. Add all root domain to regexp
host.sub(/.*?([^.]+(\.com|\.co\.uk|\.uk|\.nl))$/, "\\1")
end
puts remove_subdomain("www.example.com") # -> example.com
puts remove_subdomain("www.company.co.uk") # -> company.co.uk
puts remove_subdomain("www.sub.domain.nl") # -> domain.nl
You still need to add all (root) domains you consider root domain. So '.uk' might be the root domain, but you probably want to keep the host just before the '.co.uk' part.
Detecting the subdomain of a URL is non-trivial to do in a general sense - it's easy if you just consider the basic ones, but once you get into international territory this becomes tricky.
Edit: Consider stuff like http://mylocalschool.k12.oh.us et al.