问题
I have data in the form: id,val1,val2
example
1,0.2,0.1
1,0.1,0.7
1,0.2,0.3
2,0.7,0.9
2,0.2,0.3
2,0.4,0.5
So first I want to sort each id by val1 in decreasing order..so somethng like
1,0.2,0.1
1,0.2,0.3
1,0.1,0.7
2,0.7,0.9
2,0.4,0.5
2,0.2,0.3
And then select the second element id,val2 combination for each id So for example:
1,0.3
2,0.5
How do I approach this?
Thanks
回答1:
Pig is a scripting language and not relational one like SQL, it is well suited to work with groups with operators nested inside a FOREACH. Here is the solutions:
A = LOAD 'input' USING PigStorage(',') AS (id:int, v1:float, v2:float);
B = GROUP A BY id; -- isolate all rows for the same id
C = FOREACH B { -- here comes the scripting bit
elems = ORDER A BY v1 DESC; -- sort rows belonging to the id
two = LIMIT elems 2; -- select top 2
two_invers = ORDER two BY v1 ASC; -- sort in opposite order to bubble second value to the top
second = LIMIT two_invers 1;
GENERATE FLATTEN(group) as id, FLATTEN(second.v2);
};
DUMP C;
In your example id 1 has two rows with v1 == 0.2 but different v2, thus the second value for the id 1 can be 0.1 or 0.3
回答2:
A = LOAD 'input' USING PigStorage(',') AS (id:int, v1:int, v2:int);
B = ORDER A BY id ASC, v1 DESC;
C = FOREACH B GENERATE id, v2;
DUMP C;
来源:https://stackoverflow.com/questions/13253863/accessing-an-element-like-array-in-pig