问题
I have 150,000 rows of data which I'm attempting to query in Google BigQuery.
Column Text
contains various lengths of text, from which I want to query for particular keywords.
I've gotten as far as the query below which returns all rows containing a particular keyword (e.g. facebook):
SELECT Text From Data.Set_1
WHERE Text CONTAINS 'facebook'
Questions:
1) How do I improve the query so that it returns a total count of all occurrences of the keyword 'facebook' across 'Text' in a new column?
2) How do I upscale this to multiple keywords (facebook, cnn, bbc, twitter) and return a total count of each keyword present in the data (eg facebook 42, cnn 54, bbc 88, twitter 49)?
回答1:
for BigQuery Legacy SQL
SELECT
keyword,
COUNT(1) AS rows,
SUM(INTEGER((LENGTH(Text) - LENGTH(REPLACE(Text, keyword, ''))) / LENGTH(keyword))) AS occurences
FROM YourTable
CROSS JOIN keywords
WHERE Text CONTAINS keyword
GROUP BY keyword
Example to play with
SELECT
keyword,
COUNT(1) AS rows,
SUM(INTEGER((LENGTH(Text) - LENGTH(REPLACE(Text, keyword, ''))) / LENGTH(keyword))) AS occurences
FROM (
SELECT Text FROM
(SELECT 'facebookfacebookcnnbbccnn' AS Text),
(SELECT 'facebook' AS Text),
(SELECT 'cnn' AS Text)
) AS words
CROSS JOIN (
SELECT keyword FROM
(SELECT 'facebook' AS keyword),
(SELECT 'cnn' AS keyword),
(SELECT 'bbc' AS keyword)
) AS keywords
WHERE Text CONTAINS keyword
GROUP BY keyword
For BigQuery Standard SQL (see Enabling Standard SQL)
SELECT
keyword,
COUNT(1) AS `rows`,
SUM((LENGTH(Text) - LENGTH(REPLACE(Text, keyword, ''))) / LENGTH(keyword)) AS occurences
FROM YourTable
JOIN keywords
ON STRPOS(Text, keyword) > 0
GROUP BY keyword
Example to play with
WITH keywords AS (
SELECT 'facebook' AS keyword UNION ALL
SELECT 'cnn' AS keyword UNION ALL
SELECT 'bbc' AS keyword
),
words AS (
SELECT 'facebookfacebookcnnbbccnn' AS Text UNION ALL
SELECT 'facebook' AS Text UNION ALL
SELECT 'cnn' AS Text
)
SELECT
keyword,
COUNT(1) AS `rows`,
SUM((LENGTH(Text) - LENGTH(REPLACE(Text, keyword, ''))) / LENGTH(keyword)) AS occurences
FROM words
JOIN keywords
ON STRPOS(Text, keyword) > 0
GROUP BY keyword
回答2:
You can use a derived table to include all the words you are looking for, and then use aggregation to count the matches:
SELECT w.keyword, COUNT(s.Text)
From (SELECT 'facebook' as keyword UNION ALL
SELECT 'cnn'
) w LEFT JOIN
Data.Set_1 s
ON s.Text CONTAINS w.keyword
GROUP BY w.keyword;
Do note: This is not particularly efficient. The performance should be roughly linear in the number of keywords.
来源:https://stackoverflow.com/questions/39913724/how-to-search-for-rows-containing-specific-words-then-return-count-of-each-word