发表新帖

发表新帖

Python urlparse — extract domain name without subdomain

前端未结

关注

 7  1032

南笙 2020-12-01 02:30

Need a way to extract a domain name without the subdomain from a url using Python urlparse.

For example, I would like to extract \"google.com\" from a f

7条回答

慢半拍i (楼主)

2020-12-01 03:17

This is not a standard decomposition of the URLs.

You cannot rely on the www. to be present or optional. In a lot of cases it will not.

So if you do want to assume that only the last two components are relevant (which also won't work for the uk, e.g. www.google.co.uk) then you can do a split('.')[-2:].

Or, which is actually less error prone, strip a www. prefix.

But in either way you cannot assume that the www. is optional, because it will NOT work every time!

Here is a list of common suffixes for domains. You can try to keep the suffix + one component.

https://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1

But how do you plan to handle for example first.last.name domains? Assume that all the users with the same last name are the same company? Initially, you would only be able to get third-level domains there. By now, you apparently can get second level, too. So for .name there is no general rule.

0 讨论(0)

查看其它7个回答
发布评论:

提交评论
- 加载中...

热议问题