问题
When I execute the following query, I get only one file as output although I have 8 mappers and 0 reducers.
create table table_2 as select * from table_1.
8 mappers are invoked and there is no reducer phase. There is just only one file in the location of table_2, shouldn't there be 8 files as we have 8 mappers and 0 reducers.
回答1:
From Hive documentation, Configuration Properties...
hive.merge.mapfiles
Default Value:true
Merge small files at the end of a map-only job.
hive.merge.tezfiles
Default Value:false
Merge small files at the end of a Tez DAG
hive.merge.smallfiles.avgsize
Default Value:16000000
When the average output file size of a job is less than this number,
Hive will start an additional map-reduce job to merge the output files into bigger files...
So, if (a) your test dataset is very small and (b) you don't use TEZ but plain old MapReduce, then Hive will run a post-Map step just to merge the (intermediate) results, by default.
Whereas it would not happen after a Reduce step, unless you force hive.merge.mapredfiles
to true
.
来源:https://stackoverflow.com/questions/47272492/why-does-a-map-only-job-in-hive-results-in-a-single-output-file