问题
I am trying to build a table in hive for following json
{
"business_id": "vcNAWiLM4dR7D2nwwJ7nCA",
"hours": {
"Tuesday": {
"close": "17:00",
"open": "08:00"
},
"Friday": {
"close": "17:00",
"open": "08:00"
}
},
"open": true,
"categories": [
"Doctors",
"Health & Medical"
],
"review_count": 9,
"name": "Eric Goldberg, MD",
"neighborhoods": [],
"attributes": {
"By Appointment Only": true,
"Accepts Credit Cards": true,
"Good For Groups": 1
},
"type": "business"
}
I can create a table using following DDL,however I get an exception while querying that table.
CREATE TABLE IF NOT EXISTS business (
business_id string,
hours map<string,string>,
open boolean,
categories array<string>,
review_count int,
name string,
neighborhoods array<string>,
attributes map<string,string>,
type string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde';
The exception while retrieving data is "ClassCast:Cant cast jsoanarray to json object" . What is the correct schema for this json? Is there any took which can help me generate correct schema for given json to be used with jsonserde?
回答1:
It looks to me that the problem is hours which you defined as hours map<string,string> but should be a map<string,map<string,string> instead.
There's a tool you can use to generate the hive table definition automatically from your JSON data: https://github.com/quux00/hive-json-schema
but you may want to adjust it because when encountering a JSON Object (Anything between {} ) the tool can't know wether to translate it to a hive map or to a struct.
On your data, the tool gives me this:
CREATE TABLE x (
attributes struct<accepts credit cards:boolean,
by appointment only:boolean, good for groups:int>,
business_id string,
categories array<string>,
hours map<string:struct<close:string, open:string>
name string,
neighborhoods array<string>,
open boolean,
review_count int,
type string
)
but it looks like you want something like this:
CREATE TABLE x (
attributes map<string,string>,
business_id string,
categories array<string>,
hours map<string,struct<close:string, open:string>>,
name string,
neighborhoods array<string>,
open boolean,
review_count int,
type string
) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
STORED AS TEXTFILE;
hive> load data local inpath 'json.data' overwrite into table x;
hive> Table default.x stats: [numFiles=1, numRows=0, totalSize=416,rawDataSize=0]
OK
hive> select * from x;
OK
{"accepts credit cards":"true","by appointment only":"true",
"good for groups":"1"}
vcNAWiLM4dR7D2nwwJ7nCA
["Doctors","Health & Medical"]
{"tuesday":{"close":"17:00","open":"08:00"},
"friday":{"close":"17:00","open":"08:00"}}
Eric Goldberg, MD ["HELLO"] true 9 business
Time taken: 0.335 seconds, Fetched: 1 row(s)
hive>
A few notes though:
- Notice I used a different JSON SerDe because I don't have on my system the one you used. I used this one, I like it better because, well, I wrote it. But the create statement should work just as well with the other serde.
- You may want to convert some of those maps to structs, as they may be more convenient to query. For instance,
attributescould be a struct, but you'd need to map the names with a space in them likeaccepts credit cards. My SerDe allows to map a json attribute to a different hive column name. That is also needed then JSON uses an attribute that is a hive keyword like 'timestamp' or 'create'.
来源:https://stackoverflow.com/questions/33927116/load-complex-json-in-hive-using-jsonserde