Mysql query to extract domains from urls

前端 未结 12 2118
小鲜肉
小鲜肉 2020-12-08 08:21

sorry for my english

i have this query to extract domain from urls

SELECT SUBSTRING(LEFT(url, LOCATE(\'/\', url, 8) - 1), 8) AS domain...
         


        
12条回答
  •  离开以前
    2020-12-08 08:50

    I had to combine some of the previous answers , plus a little more hackery for my data set . This is what works for me , it returns the domain and any sub-domains:

    SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(target_url, '/', 3), '://', -1), '/', 1), '?', 1) AS domain
    

    Explanation ( cause non-trivial SQL rarely makes sense ):

    SUBSTRING_INDEX(target_url, '/', 3) - strips any path if the url has a protocol
    SUBSTRING_INDEX(THAT, '://', -1) - strips any protocol from THAT
    SUBSTRING_INDEX(THAT, '/', 1) - strips any path from THAT ( if there was no protocol )
    SUBSTRING_INDEX(THAT, '?', 1) - strips the query string from THAT ( if there was no path or trailing / )

    Test Cases:

    SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(target_url, '/', 3), '://', -1), '/', 1), '?', 1) AS domain
    FROM ( 
        SELECT       'http://test.com' as target_url 
        UNION SELECT 'https://test.com' 
        UNION SELECT 'http://test.com/one' 
        UNION SELECT 'http://test.com/?huh' 
        UNION SELECT 'http://test.com?http://ouch.foo' 
        UNION SELECT 'test.com' 
        UNION SELECT 'test.com/one'
        UNION SELECT 'test.com/one/two'
        UNION SELECT 'test.com/one/two/three'
        UNION SELECT 'test.com/one/two/three?u=http://maaaaannn'
        UNION SELECT 'http://one.test.com'
        UNION SELECT 'one.test.com/one'
        UNION SELECT 'two.one.test.com/one' ) AS Test; 
    

    Results:

    'test.com'
    'test.com'
    'test.com'
    'test.com'
    'test.com'
    'test.com'
    'test.com'
    'test.com'
    'test.com'
    'test.com'
    'one.test.com'
    'one.test.com'
    'two.one.test.com'
    

提交回复
热议问题