Need a way to extract a domain name without the subdomain from a url using Python urlparse.
For example, I would like to extract \"google.com\" from a f
This is not a standard decomposition of the URLs.
You cannot rely on the www. to be present or optional. In a lot of cases it will not.
So if you do want to assume that only the last two components are relevant (which also won't work for the uk, e.g. www.google.co.uk) then you can do a split('.')[-2:].
Or, which is actually less error prone, strip a www. prefix.
But in either way you cannot assume that the www. is optional, because it will NOT work every time!
Here is a list of common suffixes for domains. You can try to keep the suffix + one component.
https://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1
But how do you plan to handle for example first.last.name domains? Assume that all the users with the same last name are the same company? Initially, you would only be able to get third-level domains there. By now, you apparently can get second level, too. So for .name there is no general rule.