I have taken below Quoting from Hadoop - The Definitive Guide:
Note, however, that small files do not take up any more disk space than is required to store the raw contents
You could look into SequenceFiles or HAR Files depending on your use case. HAR files are analogous to the Tar command. MapReduce can act upon each HAR files with a little overhead. As for SequenceFiles, they are in a way a container of Key/Value pairs. The benefit of this is a Map task can act upon each of these pairs.