Count specific pattern in URLs bigquery sql

后端未结

关注

 3  1871

无人共我 2021-01-26 10:45

I got a table which contains URLs and some other columns, for example dates. The URLs contain IDs, separated by different values. What the IDs have in common is that they contai

3条回答

没有蜡笔的小新 (楼主)

2021-01-26 11:45

Below is for BigQuery Standard SQL

I'd like to construct a query that counts the amount of ID's in the URL

#standardSQL
SELECT date, 
  (
    SELECT COUNT(1)  
    FROM UNNEST(REGEXP_EXTRACT_ALL(url, r'[^[:punct:]]+')) part 
    WHERE NOT REGEXP_CONTAINS(part, r'[^\d]')
  ) IDs
FROM `project.dataset.table`

If to apply to sample data from your question - the output is

Row date        IDs 
1   01-01-1999  3        
2   01-02-1999  4        
3   01-02-1999  3        
4   01-01-1999  5        
5   01-01-1999  1        
6   01-01-1999  1

Secondly, I'd like to group the "amounts" by date

#standardSQL
SELECT date, IDs, COUNT(1) combinations FROM (
  SELECT date, 
    (
      SELECT COUNT(1)  
      FROM UNNEST(REGEXP_EXTRACT_ALL(url, r'[^[:punct:]]+')) part 
      WHERE NOT REGEXP_CONTAINS(part, r'[^\d]')
    ) IDs
  FROM `project.dataset.table`
)
GROUP BY date, IDs

If to apply to sample data from your question - the output is

Row date        IDs combinations     
1   01-01-1999  3   1    
2   01-02-1999  4   1    
3   01-02-1999  3   1    
4   01-01-1999  5   1    
5   01-01-1999  1   2

0 讨论(0)

查看其它3个回答