aws athena - Create table by an array of json object

会有一股神秘感。 提交于 2020-12-25 04:48:32

问题


Can I get help in creating a table on AWS Athena. For a sample example of data :

[{"lts": 150}]

AWS Glue generate the schema as :

 array (array<struct<lts:int>>)

When I try to use the created table by AWS Glue to preview the table, I had this error:

HIVE_BAD_DATA: Error parsing field value for field 0: org.openx.data.jsonserde.json.JSONObject cannot be cast to org.openx.data.jsonserde.json.JSONArray

The message error is clear, but I can't find the source of the problem!


回答1:


Hive running under AWS Athena is using Hive-JSON-Serde to serialize/deserialize JSON. For some reason, they don't support just any standard JSON. They ask for one record per line, without an array. In their words:

The following example will work.

{ "key" : 10 }
{ "key" : 20 }

But this won't:

{
  "key" : 20,
}

Nor this:

[{"key" : 20}]



回答2:


You should create a JSON classifier to convert array into list of object instead of a single array object. Use JSON path $[*] in your classifier and then set up crawler to use it:

  • Edit crawler
  • Expand 'Description and classifiers'
  • Click 'Add' on the left pane to associate you classifier with crawler

After that remove previously created table and re-run the crawler. It will create a table with proper scheme but I think Athena will still be complaining when you will try to query it. However, now you can read from that table using Glue ETL job and process single record object instead of array-objects




回答3:


This json - [{"lts": 150}] would work like a charm with below query:-

select n.lts from table_name
cross join UNNEST(table_name.array) as t (n) 

The output would be as below:-

But I have faced a challenge with json like - [{"lts": 150},{"lts": 250},{"lts": 350}]. Even if there are 3 elements in the JSON, the query is returning only the first element. This may be because of the limitation listed by @artikas. Definitely, we can change the json like below to make it work:-

{"lts": 150}
{"lts": 250}
{"lts": 350}

Please post if anyone is having a better solution to it.



来源:https://stackoverflow.com/questions/50401653/aws-athena-create-table-by-an-array-of-json-object

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!