Extract domain name from url Python

前端 未结 4 393
忘掉有多难
忘掉有多难 2020-12-11 01:16

I am tring to extract the domain names out of a list of urls. just like in https://stackoverflow.com/questions/18331948/extract-domain-name-from-the-url
My problem is th

相关标签:
4条回答
  • 2020-12-11 01:33

    With regex, you could use something like this:

    (?<=\.)([^.]+)(?:\.(?:co\.uk|ac\.us|[^.]+(?:$|\n)))

    https://regex101.com/r/WQXFy6/5

    Notice, you'll have to watch out for special cases such as co.uk.

    0 讨论(0)
  • 2020-12-11 01:55

    Simple solution via regex

    import re
    
    def domain_name(url):
        return url.split("www.")[-1].split("//")[-1].split(".")[0]
    
    0 讨论(0)
  • 2020-12-11 01:56

    Use tldextract which is more efficient version of urlparse, tldextract accurately separates the gTLD or ccTLD (generic or country code top-level domain) from the registered domain and subdomains of a URL.

    >>> import tldextract
    >>> ext = tldextract.extract('http://forums.news.cnn.com/')
    ExtractResult(subdomain='forums.news', domain='cnn', suffix='com')
    >>> ext.domain
    'cnn'
    
    0 讨论(0)
  • 2020-12-11 01:57

    It seems you can use urlparse https://docs.python.org/3/library/urllib.parse.html for that url, and then extract the netloc.

    And from the netloc you could easily extract the domain name by using split

    0 讨论(0)
提交回复
热议问题