发表新帖

发表新帖

Python urlparse — extract domain name without subdomain

前端未结

关注

 7  1024

南笙 2020-12-01 02:30

Need a way to extract a domain name without the subdomain from a url using Python urlparse.

For example, I would like to extract \"google.com\" from a f

7条回答

-上瘾入骨i (楼主)

2020-12-01 03:25
There are multiple Python modules which encapsulate the (once Mozilla) Public Suffix List in a library, several of which don't require the input to be a URL. Even though the question asks about URL normalization specifically, my requirement was to handle just domain names, and so I'm offering a tangential answer for that.

The relative merits of publicsuffix2 over publicsuffixlist or publicsuffix are unclear, but they all seem to offer the basic functionality.

publicsuffix2:
```
>>> import publicsuffix  # sic
>>> publicsuffix.PublicSuffixList().get_public_suffix('www.google.co.uk')
u'google.co.uk'
```
- Supposedly more packaging-friendly fork of publicsuffix.
publicsuffixlist:
```
>>> import publicsuffixlist
>>> publicsuffixlist.PublicSuffixList().privatesuffix('www.google.co.uk')
'google.co.uk'
```
- Advertises idna support, which I however have not tested.
publicsuffix:
```
>>> import publicsuffix
>>> publicsuffix.PublicSuffixList(publicsuffix.fetch()).get_public_suffix('www.google.co.uk')
'google.co.uk'
```
- The requirement to handle the updates and caching the downloaded file yourself is a bit of a complication.
0 讨论(0)

查看其它7个回答
发布评论:

提交评论
- 加载中...

热议问题