Pig : result of json loader empty

你离开我真会死。 提交于 2019-11-29 16:23:53
moubert

Finally , only this schema worked : If I add or remove a space different from this configuration then i gonna have an error( i also added "name" for tuples and specified "null" when it was empty, and changed the order between authors and source, but even without this congiguration it will still be wrong)

{"user_id": "kim95", "type": "Book","title": "Modern Database Systems: The Object Model, Interoperability, and Beyond.", "year": "1995", "publisher": "ACM Press and Addison-Wesley", "authors": [{"name":null"}], "source": "DBLP"}
{"user_id": "marshallo79", "type": "Book", "title": "Inequalities: Theory of Majorization and Its Application.", "year": "1979", "publisher": "Academic Press", "authors": [{"name":"Albert W. Marshall"},{"name":"Ingram Olkin"}], "source": "DBLP"}

And the working script is this one :

books = load 'data/book-seded-workings-reduced.json'
        using JsonLoader('user_id:chararray,type:chararray,title:chararray,year:chararray,publisher:chararray,authors:{(name:chararray)},source:chararray');

STORE books INTO 'book-table.csv';  //whether .tsv or .csv

try STORE books INTO 'book-no-seded.tsv' using USING org.apache.pig.piggybank.storage.JsonStorage();

You need to bu sure that the LOAD schema is good. You can try to do a DUMP books to quick check.

We had to be careful with the input data and the schema when we used the Pig JsonLoader for this tutorial http://gethue.com/hadoop-tutorials-ii-1-prepare-the-data-for-analysis/.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!