How do I split in Pig a tuple of many maps into different rows

喜夏-厌秋 提交于 2019-12-10 23:41:40

问题


I have a relation in Pig that looks like this:

([account_id#100,
 timestamp#1434,
 id#900],

[account_id#100,
 timestamp#1434,
 id#901],

[account_id#100,
 timestamp#1434,
 id#902])

As you can see, I have three map objects within a tuple. All of the data above is within the $0'th field in the relation. So the data above in a relation with a single bytearray column.

The data is loaded as follows:

data = load 's3://data/data' using com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad');

DESCRIBE data;

data: {bytearray}

How do I split this data structure into three rows so that the output is as follows?

data: {account_id:chararray, timestamp:chararray, id:int}
(100, 1434,900)
(100, 1434,901)
(100, 1434,902)

回答1:


It is very difficult to guess your problem without having a sample input data. If this is an intermediate result, then write it out using a STORE and put the output file as something that we can input to try out. I was able to solve this using STRSPLIT but am not sure if you meant that the input is a single column and a single row or are these three different rows with the same column.

In either case, Flattening out the data using the FLATTEN operator and using STRSPLIT later should help. If I get more information and input data for the problem, I can give a working example.

Data -> FLATTEN to get out of bag -> STRSPLIT over "," in a FOREACH,GENERATE


来源:https://stackoverflow.com/questions/26043445/how-do-i-split-in-pig-a-tuple-of-many-maps-into-different-rows

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!