Need a way to extract a domain name without the subdomain from a url using Python urlparse.
For example, I would like to extract \"google.com\"
from a f
There are multiple Python modules which encapsulate the (once Mozilla) Public Suffix List in a library, several of which don't require the input to be a URL. Even though the question asks about URL normalization specifically, my requirement was to handle just domain names, and so I'm offering a tangential answer for that.
The relative merits of publicsuffix2 over publicsuffixlist or publicsuffix are unclear, but they all seem to offer the basic functionality.
publicsuffix2:
>>> import publicsuffix # sic
>>> publicsuffix.PublicSuffixList().get_public_suffix('www.google.co.uk')
u'google.co.uk'
publicsuffix
.publicsuffixlist:
>>> import publicsuffixlist
>>> publicsuffixlist.PublicSuffixList().privatesuffix('www.google.co.uk')
'google.co.uk'
idna
support, which I however have not tested.publicsuffix:
>>> import publicsuffix
>>> publicsuffix.PublicSuffixList(publicsuffix.fetch()).get_public_suffix('www.google.co.uk')
'google.co.uk'