Equivalent of linux 'diff' in Apache Pig
I want to be able to do a standard diff on two large files. I've got something that will work but it's not nearly as quick as diff on the command line. A = load 'A' as (line); B = load 'B' as (line); JOINED = join A by line full outer, B by line; DIFF = FILTER JOINED by A::line is null or B::line is null; DIFF2 = FOREACH DIFF GENERATE (A::line is null?B::line : A::line), (A::line is null?'REMOVED':'ADDED'); STORE DIFF2 into 'diff'; Anyone got any better ways to do this? I use the following approaches. (My JOIN approach is very similar but this method does not replicate the behavior of diff