Pig: is it possible to write a loop over variables in a list?

前端 未结 1 1617
陌清茗
陌清茗 2020-12-20 07:38

I have to loop over 30 variables in a list

[var1,var2, ... , var30]

and for each variable I use some PIG group by statement such as

1条回答
  •  北海茫月
    2020-12-20 08:03

    I think what you're looking for is the pig macro

    Create a relation for your 30 variables, and iterate on them by foreach, and call a macro which get 2 params: your data relation and the var you want to group by. Just check the example in the link the macro is really similar what you'd like to do.

    UPDATE & code

    So here's the macro you can use:

    DEFINE my_cnt(data, group_field) RETURNS C {
            $C = FOREACH (GROUP $data by $group_field) GENERATE
                    group AS mygroup,
                    COUNT($data) AS count;
    };
    

    Use the macro:

    IMPORT 'cnt.macro';
    
    data = LOAD 'data.txt' USING PigStorage(',') AS (field:chararray, value:chararray);
    DESCRIBE data;
    
    e = my_cnt(data,'the_field_you_group_by');
    DESCRIBE e;
    DUMP e;
    

    I'm still thinking on how can you iterate through on your fields you'd like to group by. My original suggestion to foreach through a relation what contains the filed names not correct. (To create a UDF for this always works.) Let me think about it. But this macro works as is if you call by all the filed name you want to group.

    0 讨论(0)
提交回复
热议问题