Python urlparse — extract domain name without subdomain

前端 未结 7 1026
南笙
南笙 2020-12-01 02:30

Need a way to extract a domain name without the subdomain from a url using Python urlparse.

For example, I would like to extract \"google.com\" from a f

7条回答
  •  长情又很酷
    2020-12-01 03:16

    This is an update, based on the bounty request for an updated answer

    Start by using the tld package. A description of the package:

    Extracts the top level domain (TLD) from the URL given. List of TLD names is taken from Mozilla http://mxr.mozilla.org/mozilla/source/netwerk/dns/src/effective_tld_names.dat?raw=1

    from tld import get_tld
    from tld.utils import update_tld_names
    update_tld_names()
    
    print get_tld("http://www.google.co.uk")
    print get_tld("http://zap.co.it")
    print get_tld("http://google.com")
    print get_tld("http://mail.google.com")
    print get_tld("http://mail.google.co.uk")
    print get_tld("http://google.co.uk")
    

    This outputs

    google.co.uk
    zap.co.it
    google.com
    google.com
    google.co.uk
    google.co.uk
    

    Notice that it correctly handles country level TLDs by leaving co.uk and co.it, but properly removes the www and mail subdomains for both .com and .co.uk

    The update_tld_names() call at the beginning of the script is used to update/sync the tld names with the most recent version from Mozilla.

提交回复
热议问题