Loading JSON file with serde in Cloudera

我的梦境 提交于 2019-12-31 07:18:12

问题


I am trying to work with a JSON file with this bag structure :

{
   "user_id": "kim95",
   "type": "Book",
   "title": "Modern Database Systems: The Object Model, Interoperability, and Beyond.",
   "year": "1995",
   "publisher": "ACM Press and Addison-Wesley",
   "authors": [
      {
         "name": "null"
      }
   ],
   "source": "DBLP"
}
{
   "user_id": "marshallo79",
   "type": "Book",
   "title": "Inequalities: Theory of Majorization and Its Application.",
   "year": "1979",
   "publisher": "Academic Press",
   "authors": [
      {
         "name": "Albert W. Marshall" 
      },
      {
         "name": "Ingram Olkin"
      }
   ],
   "source": "DBLP"
}

I tried to use serde to load JSON data for Hive. I followed both ways that I saw here : http://blog.cloudera.com/blog/2012/12/how-to-use-a-serde-in-apache-hive/

With this code :

CREATE EXTERNAL TABLE IF NOT EXISTS serd (
           user_id:string, 
           type:string, 
           title:string,
           year:string,
           publisher:string,
           authors:array<struct<name:string>>,
           source:string)       
    ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
    LOCATION '/user/hdfs/data/book-seded_workings-reduced.json';

I got this error:

error while compiling statement: failed: parseexception line 2:17 cannot recognize input near ':' 'string' ',' in column type

I alson tried this version : https://github.com/rcongiu/Hive-JSON-Serde

which gave a different error :

Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot validate serde: org.openx.data.jsonserde.JsonSerde

Any idea?

I also want to know what are alternatives to work with a JSON like this to make queries on 'name' field in 'authors'. Whether it's Pig or Hive?

I have already converted it in to a "tsv" file. But, since my authors column is a tuple, I don't know how make requests on 'name' with Hive, If I build a table from this file. Should I change my script for "tsv" conversion or keep it? Or are there any alternatives with Hive or Pig?


回答1:


Hive does not have built in support for JSON. So for using JSON with Hive we need to use third part jars like: https://github.com/rcongiu/Hive-JSON-Serde

You have couple of issues with the create table statement. It should look like this:

CREATE EXTERNAL TABLE IF NOT EXISTS serd ( 
user_id string,type string,title string,year string,publisher string,authors array<string>,source:string)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION...

The JSON records your are using keep each record in a single line like this:

{"user_id": "kim95", "type": "Book", "title": "Modern Database Systems: The Object Model, Interoperability, and Beyond.", "year": "1995", "publisher": "ACM Press and Addison-Wesley", "authors": [{"name":"null"}], "source": "DBLP"} 
{"user_id": "marshallo79", "type": "Book", "title": "Inequalities: Theory of Majorization and Its Application.", "year": "1979", "publisher": "Academic Press","authors": [{"name":"Albert W. Marshall"},{"name":"Ingram Olkin"}], "source": "DBLP"}

After downloading the project from GIT you need to compile the the project which will create a jar you need to add this jar in the Hive session before running the create table statement.

Hope it helps...!!!




回答2:


add jar only add to session which won't be available and finally it is getting error. Get the JAR loaded on all the nodes at Hive and Map Reduce path like the below location so that HIVE and Map Reduce component will pick this whenever it’s been called.

  1. /hadoop/CDH_5.2.0_Linux_parcel/parcels/CDH-5.2.0- 1.cdh5.2.0.p0.36/lib/hive/lib/json-serde-1.3.6-jar-with-dependencies.jar

  2. /hadoop/CDH_5.2.0_Linux_parcel/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hadoop-mapreduce/lib/json-serde-1.3.6-jar-with-dependencies.jar

Note: This path varies to cluster.



来源:https://stackoverflow.com/questions/25149700/loading-json-file-with-serde-in-cloudera

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!