I want to extract the domain name(name of the site+TLD) from a list of URLs which may vary in their format. for instance: Current state---->what I want
mai
This is somewhat non-trivial, as there is no simple rule to determine what makes a for a valid public suffix (site name + TLD). Instead, what makes a public suffix is maintained as a list at PublicSuffix.org.
A python package exists that queries that list (stored locally); it's called publicsuffix:
>>> from publicsuffix import PublicSuffixList
>>> psl = PublicSuffixList()
>>> print psl.get_public_suffix('mail.yahoo.com')
yahoo.com
>>> print psl.get_public_suffix('account.hotmail.co.uk')
hotmail.co.uk