How to count frequency of elements in a bigquery array field

丶灬走出姿态 提交于 2020-06-28 02:43:25

问题


I have a table that looks like this:

I am looking for a table that gives a frequency count of the elements in the fields l_0, l_1, l_2, l_3.

For example the output should look like this:

| author_id  | year | l_o.name         | l_0.count| l1.name    | l1.count | l2.name             | l2.count| l3.name            | l3.count|
| 2164089123 | 1987 | biology          | 3        | botany     | 3        |                     |         |                    |         |
| 2595831531 | 1987 | computer science | 2        | simulation | 2        | computer simulation | 2       | mathematical model | 2       |

Edit:

In some cases the array field might have more than one type of element. For example l_0 could be ['biology', 'biology', 'geometry', 'geometry']. In that case the output for fields l_0, l_1, l_2, l_3 would be a nested repeated field with all the elements in l_0.name and all the corresponding counts in the l_0.count.


回答1:


This should work, assuming you want to count on a per-array basis:

SELECT
  author_id,
  year,
  (SELECT AS STRUCT ANY_VALUE(l_0) AS name, COUNT(*) AS count
   FROM UNNEST(l_0) AS l_0) AS l_0,
  (SELECT AS STRUCT ANY_VALUE(l_1) AS name, COUNT(*) AS count
   FROM UNNEST(l_1) AS l_1) AS l_1,
  (SELECT AS STRUCT ANY_VALUE(l_2) AS name, COUNT(*) AS count
   FROM UNNEST(l_2) AS l_2) AS l_2,
  (SELECT AS STRUCT ANY_VALUE(l_3) AS name, COUNT(*) AS count
   FROM UNNEST(l_3) AS l_3) AS l_3
FROM YourTable;

To avoid so much repetition, you can make use of a SQL UDF:

CREATE TEMP FUNCTION GetNameAndCount(elements ARRAY<STRING>) AS (
  (SELECT AS STRUCT ANY_VALUE(elem) AS name, COUNT(*) AS count
   FROM UNNEST(elements) AS elem)
);

SELECT
  author_id,
  year,
  GetNameAndCount(l_0) AS l_0,
  GetNameAndCount(l_1) AS l_1,
  GetNameAndCount(l_2) AS l_2,
  GetNameAndCount(l_3) AS l_3
FROM YourTable;

If you potentially need to account for multiple different names within an array, you can have the UDF return an array of them with associated counts instead:

CREATE TEMP FUNCTION GetNamesAndCounts(elements ARRAY<STRING>) AS (
  ARRAY(
    SELECT AS STRUCT elem AS name, COUNT(*) AS count
    FROM UNNEST(elements) AS elem
    GROUP BY elem
    ORDER BY count
  )
);

SELECT
  author_id,
  year,
  GetNamesAndCounts(l_0) AS l_0,
  GetNamesAndCounts(l_1) AS l_1,
  GetNamesAndCounts(l_2) AS l_2,
  GetNamesAndCounts(l_3) AS l_3
FROM YourTable;

Note that if you want to perform counting across rows, however, you'll need to unnest the arrays and perform the GROUP BY at the outer level, but it doesn't look like this is your intention based on the question.



来源:https://stackoverflow.com/questions/48411331/how-to-count-frequency-of-elements-in-a-bigquery-array-field

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!