Count specific pattern in URLs bigquery sql

后端 未结 3 1871
无人共我
无人共我 2021-01-26 10:45

I got a table which contains URLs and some other columns, for example dates. The URLs contain IDs, separated by different values. What the IDs have in common is that they contai

3条回答
  •  没有蜡笔的小新
    2021-01-26 11:45

    Below is for BigQuery Standard SQL

    I'd like to construct a query that counts the amount of ID's in the URL

    #standardSQL
    SELECT date, 
      (
        SELECT COUNT(1)  
        FROM UNNEST(REGEXP_EXTRACT_ALL(url, r'[^[:punct:]]+')) part 
        WHERE NOT REGEXP_CONTAINS(part, r'[^\d]')
      ) IDs
    FROM `project.dataset.table`
    

    If to apply to sample data from your question - the output is

    Row date        IDs 
    1   01-01-1999  3        
    2   01-02-1999  4        
    3   01-02-1999  3        
    4   01-01-1999  5        
    5   01-01-1999  1        
    6   01-01-1999  1        
    

    Secondly, I'd like to group the "amounts" by date

    #standardSQL
    SELECT date, IDs, COUNT(1) combinations FROM (
      SELECT date, 
        (
          SELECT COUNT(1)  
          FROM UNNEST(REGEXP_EXTRACT_ALL(url, r'[^[:punct:]]+')) part 
          WHERE NOT REGEXP_CONTAINS(part, r'[^\d]')
        ) IDs
      FROM `project.dataset.table`
    )
    GROUP BY date, IDs   
    

    If to apply to sample data from your question - the output is

    Row date        IDs combinations     
    1   01-01-1999  3   1    
    2   01-02-1999  4   1    
    3   01-02-1999  3   1    
    4   01-01-1999  5   1    
    5   01-01-1999  1   2    
    

提交回复
热议问题