How does Hive decide when to use map reduce and when not to?

前端 未结 4 692
谎友^
谎友^ 2020-12-09 20:57

As a simple example,

select * from tablename;

DOES NOT kick in map reduce, while

select count(*) from tablename;

4条回答
  •  一个人的身影
    2020-12-09 21:51

    Whenever we fire a query like select * from tablename, Hive reads the data file and fetches the entire data without doing any aggregation(min/max/count etc.). It'll call a FetchTask rather than a mapreduce task.

    This is also an optimization technique in Hive. hive.fetch.task.conversion property can (i.e. FETCH task) minimize latency of map-reduce overhead.

    This is like we are reading a hadoop file : hadoop fs -cat filename

    But if we use select colNames from tablename, it requires a map-reduce job as it needs to extract the 'column' from each row by parsing it from the file it loads.

提交回复
热议问题