Athena Query for Array Column

痴心易碎 提交于 2020-06-29 03:33:13

问题


I require your help in querying the array column in athena. Presently i have a table as mentioned below:

1   2020-05-06 01:13:48 dv1 [{addedtitle=apple, addedvalue=null, keytitle=Increase apple, key=p9, recvalue=0.899999999, unit=lbs, isbalanced=null}, {addedtitle=Orange (12%), addedvalue=15.0, keytitle=Increase Orange, key=p8, recvalue=18.218999999999998, unit=fl oz, isbalanced=null}, {addedtitle=Lemon, addedvalue=32.0, keytitle=Increase Lemon, key=p10, recvalue=33.6, unit=oz, isbalanced=null}, {addedtitle=Calcium (100%), addedvalue=86.0, keytitle=Increase Calcium , key=p6, recvalue=88.72002, unit=oz, isbalanced=null}, {addedtitle=Mango, addedvalue=10.0, keytitle=Increase Mango, key=p11, recvalue=11.7, unit=oz, isbalanced=null}]
2   2020-05-07 04:30:45 dev2    [{addedtitle=apple (12%), addedvalue=0.0, keytitle=Increase apple, key=p8, recvalue=0.88034375, unit=fl oz, isbalanced=null}, {addedtitle=Orange(31.4%), addedvalue=0.0, keytitle=Decrease Orange, key=p10, recvalue=1.83733225, unit=fl oz, isbalanced=null}, {addedtitle=Tree, addedvalue=0.0, keytitle=Increase Tree, key=p11, recvalue=1.69, unit=oz, isbalanced=null}]
5   2020-05-06 12:55:12 dev5    [{addedtitle=salt, addedvalue=0.0, keytitle=Increase salt, key=p9, recvalue=0.052500000000000005, unit=lbs, isbalanced=null}]
6   2020-05-08 07:03:59 dev6    [{addedtitle=Sugar, addedvalue=6.0, keytitle=Decrease sugar, key=p9, recvalue=2.4000000000000004, unit=fl oz, isbalanced=null}]
7   2020-05-06 12:52:39 dev7    []
8   2020-05-06 04:15:05 dev8    []
9   2020-05-07 05:02:38 dev9    []

I need to breakdown this 3rd array column into further columns so that i can import this in quicksight. Presently have a problem as quicksight does not recognize the 3rd column as it shows unsupported data types.

Can somebody please help as how to work on breaking this array into columns/rows for analyses?


回答1:


The JSON-like data in your example is unfortunately not in a format that Athena can parse.

For anyone else finding this question I can explain how it can be done if the data is JSON formatted (e.g. {"addedtitle": "apple",… and not {addedtitle=apple,…). I'm also going to assume that there are tabs between the columns and not spaces (if there are spaces you have to use the Grok serde).

First you create a table that reads tab-separated values:

CREATE EXTERNAL TABLE my_table (
  line_number int,
  date_stamp timestamp,
  id string,
  data string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
ESCAPED BY '\\'
LINES TERMINATED BY '\n'
LOCATION 's3://my-bucket/path/to/data/'

Note how the data column is typed as string and not a complex type. Had each row been only JSON we could have used the JSON serde and specified the type as a complex type – but as far as I know the serde for TSV does not support complex types (nor embedded JSON).

To extract properties from the JSON data we can use JSON functions, and we can use UNNEST create rows from each element. You are probably after a combination of the two, for eample:

SELECT
  id,
  JSON_EXTRACT_SCALAR(element, '$.addedtitle') AS addedtitle,
  JSON_EXTRACT_SCALAR(element, '$.recvalue') AS recvalue,
FROM my_table, UNNEST (JSON_PARSE(data) as ARRAY(JSON)) AS t(element)

Given the data in your question this would return:

id   | addedtitle    | recvalue
-----+---------------+----------------------
dv1  | apple         | 0.899999999
dv1  | Orange        | 18.218999999999998
dv1  | Lemon         | 33.6
dv1  | Calcium       | 88.72002
dv1  | Mango         | 11.7
dev2 | apple (12%)   | 0.88034375
dev2 | Orange(31.4%) | 1.83733225
dev2 | Tree          | 1.69
dev5 | salt          | 0.052500000000000005
dev6 | Sugar         | 2.4000000000000004

Please note that the above assumes that the data column is valid JSON, from your question it does not look like this is the case. The data does not look like it is on a format that Athena supports.



来源:https://stackoverflow.com/questions/62549815/athena-query-for-array-column

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!