AWS Athena JSON Multidimentional Array Structure

情到浓时终转凉″ 提交于 2019-12-11 12:26:04

问题


The JSON file has a structure like this:

"otherstuff" : "stuff",
"ArrayofArrays" : {
   "Array-1" : {
      "type" : "sometype",
      "is_enabled" : false,
      "is_active" : false,
      "version" : "version 1.1"
   },
   "Array-2" : {
      "type" : "sometype",
      "is_enabled" : false,
      "is_active" : false,
      "version" : "version 1.2"
   }
   ...
}

The query runs when with the following

CREATE EXTERNAL TABLE IF NOT EXISTS test2.table14 (
`otherstuff` string,
`ArrayofArrays` array<array<struct<version:string>>>
) 
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
     'serialization.format' = '1' ) LOCATION 's3://bucket/folder/' TBLPROPERTIES ('has_encrypted_data'='false')

However when I run a basic preview table query I get the following error.

HIVE_BAD_DATA: Error parsing field value for field #: org.openx.data.jsonserde.json.JSONObject cannot be cast to org.openx.data.jsonserde.json.JSONArray

There is clearly an issue with the array of array. I am unable to figure out what the structure should be. Do you know how to declair the multidimentional array for this structure?


回答1:


Your JSON does not contain any array elements. So instead of an array, you have to use the map column type, to access this structure.

CREATE EXTERNAL TABLE test14 (
  otherstuff string,
  ArrayofArrays map<string,struct<
     is_enabled:boolean,
     is_active:boolean,
     type:string,
     version:string
  >>
) 
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ('serialization.format' = '1' ) 
LOCATION 's3://bucket/'

You can access the map through the following query syntax:

select t.ArrayofArrays['array-1'] from test14 t


来源:https://stackoverflow.com/questions/48553340/aws-athena-json-multidimentional-array-structure

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!