How to create output files with fixed number of lines in hadoop/map reduce?
问题 Let's say we have N input files with different number of lines. We need to generate output files such the each output file has exactly K number of lines (except the last one which can have < K records). Is it possible to do this using single MR job? We should open the files for writing explicitly in reducer. The records in output should be shuffled. thanks, Paramesh 回答1: Assuming that the input file has 990 records which have to be split into 9 files of 100 records each and the last file of