Extract domain name from url Python

前端未结

关注

 4  397

I am tring to extract the domain names out of a list of urls. just like in https://stackoverflow.com/questions/18331948/extract-domain-name-from-the-url
My problem is th

相关标签:

4条回答

无人共我

2020-12-11 01:33

With regex, you could use something like this:

(?<=\.)([^.]+)(?:\.(?:co\.uk|ac\.us|[^.]+(?:$|\n)))

https://regex101.com/r/WQXFy6/5

Notice, you'll have to watch out for special cases such as co.uk.

0 讨论(0)
发布评论:

提交评论
- 加载中...
野性不改

2020-12-11 01:55
Simple solution via regex
```
import re

def domain_name(url):
    return url.split("www.")[-1].split("//")[-1].split(".")[0]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
别那么骄傲

2020-12-11 01:56
Use tldextract which is more efficient version of urlparse, tldextract accurately separates the gTLD or ccTLD (generic or country code top-level domain) from the registered domain and subdomains of a URL.
```
>>> import tldextract
>>> ext = tldextract.extract('http://forums.news.cnn.com/')
ExtractResult(subdomain='forums.news', domain='cnn', suffix='com')
>>> ext.domain
'cnn'
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
眼角桃花

2020-12-11 01:57

It seems you can use urlparse https://docs.python.org/3/library/urllib.parse.html for that url, and then extract the netloc.

And from the netloc you could easily extract the domain name by using split

0 讨论(0)
发布评论:

提交评论
- 加载中...