Does PIG support IN clause?
filtered = FILTER bba BY reason not in ('a','b','c','d');
or should i split it up into multiple OR's?
I didn't find it in any of the samples in the documentation.
You can get by using AND/OR/NOT
You can use below udf from Apache DataFu instead. This will help you to avoid writing lot of OR.
Pig 0.12 added In operator http://www.edureka.co/blog/operators-in-apache-pig-diagnostic-operators/ see bottom of page..release notes. Haven't located it in official docs (apart from bare mention in release notes)
No, Pig doesn't support IN Clause. I had a similar situation. Though you can use AND operator and filter keyword as a work around. like
A= LOAD 'source.txt' AS (user:chararray, age:chararray);
B= FILTER A BY ($1 matches 'tapan') AND ($1 matches 'superman');
However, if the number of filtering required is huge. Then, probably, you can just create a relation that contains all these keywords and do a join to filter wherever the occurrence matches. Hope this helps.
We can use IN clause as follows:
A = FILTER alias_name BY col_name IN (val1, val2,...,valn); DUMP A;
you can do this likes:
X = FILTER bba BY NOT reason IN ('a','b','c','d');