问题
I'm new to Pig and I'm trying to perform RANK operation within group.My data looks like
Name address Date A addr1 20150101 A addr2 20150130 B addr1 20140325 B addr2 20140821 B addr3 20150102
I want my output like this
Name address Date Rank A addr1 20150101 1 A addr2 20150130 2 B addr1 20140325 1 B addr2 20140821 2 B addr3 20150102 3
I'm using Pig-0.12.1.Is there any way to get the output in required format with pig built-in functions ??
回答1:
It will be little bit difficult to solve this problem using standard pig but with the help of datafu library
you can easily solve this problem.
Download the jar file(datafu-1.2.0.jar
) from this link
http://mvnrepository.com/artifact/com.linkedin.datafu/datafu/1.2.0, set it in your classpath and try the below approach
input
A addr1 20150101
A addr2 20150130
B addr1 20140325
B addr2 20140821
B addr3 20150102
PigScript:
REGISTER /tmp/datafu-1.2.0.jar;
define Enumerate datafu.pig.bags.Enumerate('1');
A = LOAD 'input' USING PigStorage() AS (Name:chararray,Address:chararray,Date:chararray);
B = GROUP A BY Name;
C = FOREACH B GENERATE FLATTEN(Enumerate($1));
DUMP C;
Output:
(A,addr1,20150101,1)
(A,addr2,20150130,2)
(B,addr1,20140325,1)
(B,addr2,20140821,2)
(B,addr3,20150102,3)
来源:https://stackoverflow.com/questions/28239971/pig-rank-operation-on-groups