I have a toy setup sending log4j messages to hdfs using flume. I\'m not able to configure the hdfs sink to avoid many small files. I thought I could configure the hdfs sink to
HDFS Sink has a property hdfs.batchSize (default 100) which describes "number of events written to file before it is flushed to HDFS". I think that's your problem here.
Consider also checking all other properties: HDFS Sink .