latin pig bag to tuple after group by

与世无争的帅哥 提交于 2019-12-21 05:36:18

问题


I have the following data with schema (t0: chararray,t1: int,t2: int)

(B,4,2)
(A,2,3)
(A,3,2)
(B,2,2)
(A,1,2)
(B,1,2)

I'd like to generate the following results (group by t0, and ordered by t1)

(A, ((1,2),(2,3),(3,2)))
(B, ((1,2),(2,2),(4,2)))

Please note I want only tuples in the second component, not bags. Please help.


回答1:


You should be able to do it like this.

-- A: (t0: chararray,t1: int,t2: int)

B = GROUP A BY t0 ;
C = FOREACH B {
            -- Project out the first column of A.
            projected = FOREACH A GENERATE t1, t2 ;
            -- Now you can order the projection.
            ordered = ORDER projected BY t1 ;
    GENERATE group AS t0, ordered AS vals ;
}

You can read more about nested FOREACHs here.

NOTE/UPDATE: It seems when I answered this question originally I missed the part where the asker asked for output to be in tuple form. Tuples should only be used when you know the exact number and position of the fields in the tuple. Otherwise then your schema will not be defined and it will be very difficult in order to access the fields. This is because the entire tuple will be treated as a bytearray, and so you will manually have to find and cast everything.

If you must do it this way you can not do this in pure pig. You'll have to use some sort of UDF to do this. I would recommend Python.




回答2:


use FOREACH. See the "Nested Projection" section on the PigLatin page: http://wiki.apache.org/pig/PigLatin




回答3:


You may try this..

grunt> a_input = Load '/home/training/pig/Join/order_temp.csv' Using PigStorage(',') as (t0:chararray,t1:int,t2:int);

grunt> b= Group (Order a_input by t1) By t0;



来源:https://stackoverflow.com/questions/19948614/latin-pig-bag-to-tuple-after-group-by

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!