问题
I'm trying to count how many values in a data set match a filter condition, but I'm running into issues when the filter matches no entries.
There are a lot of columns in my data
structure, but there's only three of use for this example: key
- data key for the set (not unique), value
- float value as recorded, nominal_value
- float representing the nominal value.
Our use case right now is to find the number of values that are 10% or more below the nominal value.
I'm doing something like this:
filtered_data = FILTER data BY value <= (0.9 * nominal_value);
filtered_count = FOREACH (GROUP filtered_data BY key) GENERATE COUNT(filtered_data.value);
DUMP filtered_count;
In most cases, there are no values that fall outside of the nominal range, so filtered_data
is empty (or null. Not sure how to tell which.). This results in filtered_count
also being empty/null, which is not desirable.
How can I construct a statement that will return a value of 0 when filtered_data
is empty/null? I've tried a couple of options that I've found online:
-- Extra parens in COUNT required to avoid syntax error
filtered_count = FOREACH (GROUP filtered_data BY key) GENERATE COUNT((filtered_data.value is null ? {} : filtered_data.value));
which results in:
Two inputs of BinCond must have compatible schemas. left hand side: #1259:bag{} right hand side: #1261:bag{#1260:tuple(cf#1038:float)}
And:
filtered_count = FOREACH (GROUP filtered_data BY key) GENERATE (filtered_data.value is null ? 0 : COUNT(filtered_data.value));
which results in an empty/null result.
回答1:
The way you have it set up right now, you will lose information about any keys for which the count of bad values is 0. Instead, I'd recommend preserving all keys, so that you can see positive confirmation that the count was 0, instead of inferring it by absence. To do that, just use an indicator and then SUM
that:
data2 =
FOREACH data
GENERATE
key,
((value <= 0.9*nominal_value) ? 1 : 0) AS bad;
bad_count = FOREACH (GROUP data2 BY key) GENERATE group, SUM(data2.bad);
来源:https://stackoverflow.com/questions/18086768/pig-filter-returns-empty-bag-that-i-cant-count