On Hive 2.2.0, I am filling an orc table from another source table of size 1.34 GB, using the query
INSERT INTO TABLE TableOrc SELECT * FROM Table; ---- (1)
Your initial average file size is smaller than hive.merge.smallfiles.avgsize
, that is why merge task started to merge them.
First two files merged 65.01 MB + 67.48 MB = 132.49 MB this is bigger than hive.merge.size.per.task
that is why merge task will stop to merge this resulted file with more files. It will not be splitted to be exactly 128M. The method it works is quite simple.