How to search for rows containing specific words then return count of each word?

拈花ヽ惹草 提交于 2019-12-12 23:23:09

问题


I have 150,000 rows of data which I'm attempting to query in Google BigQuery.

Column Text contains various lengths of text, from which I want to query for particular keywords.

I've gotten as far as the query below which returns all rows containing a particular keyword (e.g. facebook):

SELECT Text From Data.Set_1 
WHERE Text CONTAINS 'facebook'

Questions:

1) How do I improve the query so that it returns a total count of all occurrences of the keyword 'facebook' across 'Text' in a new column?

2) How do I upscale this to multiple keywords (facebook, cnn, bbc, twitter) and return a total count of each keyword present in the data (eg facebook 42, cnn 54, bbc 88, twitter 49)?


回答1:


for BigQuery Legacy SQL

SELECT 
  keyword, 
  COUNT(1) AS rows, 
  SUM(INTEGER((LENGTH(Text) - LENGTH(REPLACE(Text, keyword, ''))) / LENGTH(keyword))) AS occurences 
FROM YourTable 
CROSS JOIN keywords
WHERE Text CONTAINS keyword
GROUP BY keyword

Example to play with

SELECT 
  keyword, 
  COUNT(1) AS rows, 
  SUM(INTEGER((LENGTH(Text) - LENGTH(REPLACE(Text, keyword, ''))) / LENGTH(keyword))) AS occurences 
FROM (
  SELECT Text FROM
    (SELECT 'facebookfacebookcnnbbccnn' AS Text),
    (SELECT 'facebook' AS Text), 
    (SELECT 'cnn' AS Text)
) AS words 
CROSS JOIN (
  SELECT keyword FROM 
    (SELECT 'facebook' AS keyword),
    (SELECT 'cnn' AS keyword), 
    (SELECT 'bbc' AS keyword)
) AS keywords
WHERE Text CONTAINS keyword
GROUP BY keyword

For BigQuery Standard SQL (see Enabling Standard SQL)

SELECT 
  keyword, 
  COUNT(1) AS `rows`, 
  SUM((LENGTH(Text) - LENGTH(REPLACE(Text, keyword, ''))) / LENGTH(keyword)) AS occurences  
FROM YourTable 
JOIN keywords
ON STRPOS(Text, keyword) > 0
GROUP BY keyword

Example to play with

WITH keywords AS (
  SELECT 'facebook' AS keyword UNION ALL
  SELECT 'cnn' AS keyword UNION ALL
  SELECT 'bbc' AS keyword 
),
words AS (
  SELECT 'facebookfacebookcnnbbccnn' AS Text UNION ALL
  SELECT 'facebook' AS Text UNION ALL
  SELECT 'cnn' AS Text 
)
SELECT 
  keyword, 
  COUNT(1) AS `rows`, 
  SUM((LENGTH(Text) - LENGTH(REPLACE(Text, keyword, ''))) / LENGTH(keyword)) AS occurences  
FROM words 
JOIN keywords
ON STRPOS(Text, keyword) > 0
GROUP BY keyword



回答2:


You can use a derived table to include all the words you are looking for, and then use aggregation to count the matches:

SELECT w.keyword, COUNT(s.Text)
From (SELECT 'facebook' as keyword UNION ALL
      SELECT 'cnn'
     ) w LEFT JOIN
     Data.Set_1 s
     ON s.Text CONTAINS w.keyword
GROUP BY w.keyword;

Do note: This is not particularly efficient. The performance should be roughly linear in the number of keywords.



来源:https://stackoverflow.com/questions/39913724/how-to-search-for-rows-containing-specific-words-then-return-count-of-each-word

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!