Create athena table with column as unstructured JSON from S3

萝らか妹 提交于 2021-01-28 07:41:21

问题


I am currently creating an Athena table as follows:

 CREATE EXTERNAL TABLE `foo_streaming`(
  `type` string, 
  `message` struct<a:string,b:string,c:string>)
PARTITIONED BY ( 
  `dt` string)
ROW FORMAT SERDE 
  'org.apache.hive.hcatalog.data.JsonSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://foo/data'

However, instead of treating the message struct as structured data, I would like to read it as a JSON blob, because the data could change at any point. How do I do this with Athena?

I tried the following, but it gives me an error. I tried googling, but found nothing.

CREATE EXTERNAL TABLE `foo_streaming`(
      `type` string, 
      `message` JSON)
    PARTITIONED BY ( 
      `dt` string)
    ROW FORMAT SERDE 
      'org.apache.hive.hcatalog.data.JsonSerDe' 
    STORED AS INPUTFORMAT 
      'org.apache.hadoop.mapred.TextInputFormat' 
    OUTPUTFORMAT 
      'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    LOCATION
      's3://foo/data'

Sample data from S3 would like like:

{ "type": "GTF", "message": { "a": 1, "b": 2 } }
{ "type": "GTB", "message": { "c": 1, "d": 2, "x": { "testid": "abc" } } }
{ "type": "GTE", "message": { "error_code": 1 } }

回答1:


Use string as type, and then Athena/Presto's JSON functions to extract values from the blobs.

You can see this solution in action in the documentation for how to query CloudTrail logs with Athena. The requestparameters, and responseelements properties are JSON, but are service-dependent, and therefore can't be described with a struct.

json doesn't work as a type because it's not a type recognised by Hive, it's a Presto thing, IIRC. I find it pretty confusing in general when types are supported or not, or called different things depending on whether it's DDL or DML (e.g. string and varchar).



来源:https://stackoverflow.com/questions/57299638/create-athena-table-with-column-as-unstructured-json-from-s3

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!