PIG how to count a number of rows in alias

前端 未结 7 1838
盖世英雄少女心
盖世英雄少女心 2020-12-07 11:03

I did something like this to count the number of rows in an alias in PIG:

logs = LOAD \'log\'
logs_w_one = foreach logs generate 1 as one;
logs_group = group         


        
7条回答
  •  抹茶落季
    2020-12-07 11:40

    Here is a version with optimization. All the solutions above would require pig to read and write full tuple when counting, this script below just write '1'-s

    DEFINE row_count(inBag, name) RETURNS result {
        X = FOREACH $inBag generate 1;
        $result = FOREACH (GROUP X ALL PARALLEL 1) GENERATE '$name', COUNT(X);
    };
    

    The use it like

    xxx = row_count(rows, 'rows_count');
    

提交回复
热议问题