pig skewed join with a big table causes “Split metadata size exceeded 10000000”
问题 We have a pig join between a small (16M rows) distinct table and a big (6B rows) skewed table. A regular join finishes in 2 hours (after some tweaking). We tried using skewed and been able to improve the performance to 20 minutes. HOWEVER, when we try a bigger skewed table (19B rows), we get this message from the SAMPLER job: Split metadata size exceeded 10000000. Aborting job job_201305151351_21573 [ScriptRunner] at org.apache.hadoop.mapreduce.split.SplitMetaInfoReader.readSplitMetaInfo