PIG how to count a number of rows in alias

前端 未结 7 1885
盖世英雄少女心
盖世英雄少女心 2020-12-07 11:03

I did something like this to count the number of rows in an alias in PIG:

logs = LOAD \'log\'
logs_w_one = foreach logs generate 1 as one;
logs_group = group         


        
7条回答
  •  孤街浪徒
    2020-12-07 11:38

    What you want is to count all the lines in a relation (dataset in Pig Latin)

    This is very easy following the next steps:

    logs = LOAD 'log'; --relation called logs, using PigStorage with tab as field delimiter
    logs_grouped = GROUP logs ALL;--gives a relation with one row with logs as a bag
    number = FOREACH LOGS_GROUP GENERATE COUNT_STAR(logs);--show me the number
    

    I have to say it is important Kevin's point as using COUNT instead of COUNT_STAR we would have only the number of lines which first field is not null.

    Also I like Jerome's one line syntax it is more concise but in order to be didactic I prefer to divide it in two and add some comment.

    In general I prefer:

    numerito = FOREACH (GROUP CARGADOS3 ALL) GENERATE COUNT_STAR(CARGADOS3);
    

    over

    name = GROUP CARGADOS3 ALL
    number = FOREACH name GENERATE COUNT_STAR(CARGADOS3);
    

提交回复
热议问题