How do I import an array of data into separate rows in a hive table?

江枫思渺然 提交于 2019-12-09 19:29:42

问题


I am trying to import data in the following format into a hive table

[
    {
      "identifier" : "id#1",
      "dataA" : "dataA#1"
    },
    {
      "identifier" : "id#2",
      "dataA" : "dataA#2"
    }
]

I have multiple files like this and I want each {} to form one row in the table. This is what I have tried:

CREATE EXTERNAL TABLE final_table(
    identifier STRING,
    dataA STRING
) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION "s3://bucket/path_in_bucket/"

This is not creating a single row for each {} though. I have also tried

CREATE EXTERNAL TABLE final_table(
    rows ARRAY< STRUCT<
    identifier: STRING,
    dataA: STRING
    >>
) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION "s3://bucket/path_in_bucket/"

but this is not work either. Is there some way of specifying that the input as an array with each record being an item in the array to the hive query? Any suggestions on what to do?


回答1:


Here is what you need

Method 1: Adding name to the array

Data

{"data":[{"identifier" : "id#1","dataA" : "dataA#1"},{"identifier" : "id#2","dataA" : "dataA#2"}]}

SQL

SET hive.support.sql11.reserved.keywords=false;

CREATE EXTERNAL TABLE IF NOT EXISTS ramesh_test (
  data array<
    struct<
      identifier:STRING, 
      dataA:STRING
    >
  >
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 'my_location';

SELECT rows.identifier,
       rows.dataA
  FROM ramesh_test d
LATERAL VIEW EXPLODE(d.data) d1 AS rows  ;

Output

Method 2 - No Changes to the data

Data

[{"identifier":"id#1","dataA":"dataA#1"},{"identifier":"id#2","dataA":"dataA#2"}]

SQL

CREATE EXTERNAL TABLE IF NOT EXISTS ramesh_raw_json (
  json STRING
)
LOCATION 'my_location';

SELECT get_json_object (exp.json_object, '$.identifier') AS Identifier,
       get_json_object (exp.json_object, '$.dataA') AS Identifier
  FROM ( SELECT json_object
           FROM ramesh_raw_json a
           LATERAL VIEW EXPLODE (split(regexp_replace(regexp_replace(a.json,'\\}\\,\\{','\\}\\;\\{'),'\\[|\\]',''), '\\;')) json_exploded AS json_object ) exp;

Output




回答2:


JSON records in data files must appear one per line, an empty line would produce a NULL record.

This json should work

{ "identifier" : "id#1", "dataA" : "dataA#1" }, { "identifier" : "id#2", "dataA" : "dataA#2" }



来源:https://stackoverflow.com/questions/47522270/how-do-i-import-an-array-of-data-into-separate-rows-in-a-hive-table

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!