Is there a combine Input format for hadoop streaming?
问题 I have many small input files, and I want to combine them using some input format like CombineFileInputFormat to launch fewer mapper tasks. I know I can use Java API to do this, but I don't know whether there's a streaming jar library to support this function while I'm using Hadoop streaming. 回答1: Hadoop streaming uses TextInputFormat by default but any other input format can be used, including CombineFileInputFormat . You can change the input format from the command line, using the option