问题
I'm trying to find the Pig equivalent of the SQL functions GREATEST and LEAST. These functions are the scalar equivalent of the aggregate SQL functions MAX
and MIN
, respectively.
Essentially, I want to be able to say something like this:
x = LOAD 'file:///a/b/c.csv' USING PigStorage() AS (a: int, b: int, c: int);
y = FOREACH x GENERATE a AS a: int, b AS b: int, c AS c: int, GREATEST(a, b, c) AS g: int;
I know I could use bags and MAX
to get this done, but I'm translating from another language into Pig and that implementation would be difficult to integrate.
Is there an "inline" approach I could use here? Some builtin function I'm overlooking, or maybe a UDF in Piggybank or DataFu, for example, would be ideal! If there's a completely "inline" version that uses bags and I'm just not thinking of it, that's fine too!
Thank you!
回答1:
It turns out that there are "inline" bag-based approaches that work:
x = LOAD 'file:///a/b/c.csv' USING PigStorage() AS (a: int, b: int, c: int);
y = FOREACH x GENERATE a AS a: int, b AS b: int, c AS c: int, MAX(TOBAG(a, b, c)) AS g: int;
来源:https://stackoverflow.com/questions/27262945/pig-equivalent-of-sql-greatest-least