I did something like this to count the number of rows in an alias in PIG:
logs = LOAD \'log\'
logs_w_one = foreach logs generate 1 as one;
logs_group = group
Here is a version with optimization. All the solutions above would require pig to read and write full tuple when counting, this script below just write '1'-s
DEFINE row_count(inBag, name) RETURNS result {
X = FOREACH $inBag generate 1;
$result = FOREACH (GROUP X ALL PARALLEL 1) GENERATE '$name', COUNT(X);
};
The use it like
xxx = row_count(rows, 'rows_count');