Hadoop to reduce from multiple input formats
I have two files with different data formats in HDFS. How would a job set up look like, if I needed to reduce across both data files? e.g. imagine the common word count problem, where in one file you have space as the world delimiter and in another file the underscore. In my approach I need different mappers for the various file formats, that than feed into a common reducer. How to do that? Or is there a better solution than mine? Donald Miner Check out the MultipleInputs class that solves this exact problem. It's pretty neat-- you pass in the InputFormat and optionally the Mapper class. If