发表新帖

发表新帖

Python urlparse — extract domain name without subdomain

前端未结

关注

 7  1026

南笙 2020-12-01 02:30

Need a way to extract a domain name without the subdomain from a url using Python urlparse.

For example, I would like to extract \"google.com\" from a f

7条回答

长情又很酷 (楼主)

2020-12-01 03:16
This is an update, based on the bounty request for an updated answer

Start by using the tld package. A description of the package:

Extracts the top level domain (TLD) from the URL given. List of TLD names is taken from Mozilla http://mxr.mozilla.org/mozilla/source/netwerk/dns/src/effective_tld_names.dat?raw=1
```
from tld import get_tld
from tld.utils import update_tld_names
update_tld_names()

print get_tld("http://www.google.co.uk")
print get_tld("http://zap.co.it")
print get_tld("http://google.com")
print get_tld("http://mail.google.com")
print get_tld("http://mail.google.co.uk")
print get_tld("http://google.co.uk")
```
This outputs
```
google.co.uk
zap.co.it
google.com
google.com
google.co.uk
google.co.uk
```
Notice that it correctly handles country level TLDs by leaving co.uk and co.it, but properly removes the www and mail subdomains for both .com and .co.uk

The update_tld_names() call at the beginning of the script is used to update/sync the tld names with the most recent version from Mozilla.
0 讨论(0)

查看其它7个回答
发布评论:

提交评论
- 加载中...

热议问题