Pig - RANK Operation on Groups

风流意气都作罢 提交于 2019-12-12 01:53:57

问题


I'm new to Pig and I'm trying to perform RANK operation within group.My data looks like


   Name address Date
    A   addr1   20150101
    A   addr2   20150130
    B   addr1   20140325
    B   addr2   20140821
    B   addr3   20150102

I want my output like this


    Name    address Date     Rank
    A   addr1   20150101  1
    A   addr2   20150130  2
    B   addr1   20140325  1
    B   addr2   20140821  2
    B   addr3   20150102  3

I'm using Pig-0.12.1.Is there any way to get the output in required format with pig built-in functions ??


回答1:


It will be little bit difficult to solve this problem using standard pig but with the help of datafu library you can easily solve this problem.

Download the jar file(datafu-1.2.0.jar) from this link http://mvnrepository.com/artifact/com.linkedin.datafu/datafu/1.2.0, set it in your classpath and try the below approach

input

A       addr1   20150101
A       addr2   20150130
B       addr1   20140325
B       addr2   20140821
B       addr3   20150102

PigScript:

REGISTER /tmp/datafu-1.2.0.jar;
define Enumerate datafu.pig.bags.Enumerate('1');

A = LOAD 'input' USING PigStorage() AS (Name:chararray,Address:chararray,Date:chararray);
B = GROUP A BY Name;
C = FOREACH B GENERATE FLATTEN(Enumerate($1));
DUMP C;

Output:

(A,addr1,20150101,1)
(A,addr2,20150130,2)
(B,addr1,20140325,1)
(B,addr2,20140821,2)
(B,addr3,20150102,3)


来源:https://stackoverflow.com/questions/28239971/pig-rank-operation-on-groups

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!