Python urlparse — extract domain name without subdomain

前端 未结 7 967
南笙
南笙 2020-12-01 02:30

Need a way to extract a domain name without the subdomain from a url using Python urlparse.

For example, I would like to extract \"google.com\" from a f

7条回答
  •  -上瘾入骨i
    2020-12-01 03:25

    There are multiple Python modules which encapsulate the (once Mozilla) Public Suffix List in a library, several of which don't require the input to be a URL. Even though the question asks about URL normalization specifically, my requirement was to handle just domain names, and so I'm offering a tangential answer for that.

    The relative merits of publicsuffix2 over publicsuffixlist or publicsuffix are unclear, but they all seem to offer the basic functionality.

    publicsuffix2:

    >>> import publicsuffix  # sic
    >>> publicsuffix.PublicSuffixList().get_public_suffix('www.google.co.uk')
    u'google.co.uk'
    
    • Supposedly more packaging-friendly fork of publicsuffix.

    publicsuffixlist:

    >>> import publicsuffixlist
    >>> publicsuffixlist.PublicSuffixList().privatesuffix('www.google.co.uk')
    'google.co.uk'
    
    • Advertises idna support, which I however have not tested.

    publicsuffix:

    >>> import publicsuffix
    >>> publicsuffix.PublicSuffixList(publicsuffix.fetch()).get_public_suffix('www.google.co.uk')
    'google.co.uk'
    
    • The requirement to handle the updates and caching the downloaded file yourself is a bit of a complication.

提交回复
热议问题