How to find the most common bi-grams with BigQuery?
I want to find the most common bi-grams (pair of words) in my table. How can I do this with BigQuery? BigQuery now supports SPLIT(): SELECT word, nextword, COUNT(*) c FROM ( SELECT pos, title, word, LEAD(word) OVER(PARTITION BY created_utc,title ORDER BY pos) nextword FROM ( SELECT created_utc, title, word, pos FROM FLATTEN( (SELECT created_utc, title, word, POSITION(word) pos FROM (SELECT created_utc, title, SPLIT(title, ' ') word FROM [bigquery-samples:reddit.full]) ), word) )) WHERE nextword IS NOT null GROUP EACH BY 1, 2 ORDER BY c DESC LIMIT 100 来源: https://stackoverflow.com/questions