Since there is no else or default statements in pig split operation what would be the most elegant way to do the following? I\'m not a big fan of having to copy paste code.<
It is. Checking out the docs for SPLIT, you want to use OTHERWISE. For example:
SPLIT data
INTO good_data IF (
(value > 0)),
good_data_big_values IF (
(value > 100)),
bad_data OTHERWISE;
So you almost got it. :)
NOTE: SPLIT can put a single row into both good_data and good_data_big_values if, for example, value was 150. I don't know if this is what you want, but you should be aware of it regardless. This also means that bad_data will only contain rows where value is 0 or less.