发表新帖

发表新帖

How does Hive decide when to use map reduce and when not to?

前端未结

关注

 4  692

谎友^ 2020-12-09 20:57

As a simple example,

select * from tablename;

DOES NOT kick in map reduce, while

select count(*) from tablename;

4条回答

一个人的身影 (楼主)

2020-12-09 21:51

Whenever we fire a query like select * from tablename, Hive reads the data file and fetches the entire data without doing any aggregation(min/max/count etc.). It'll call a FetchTask rather than a mapreduce task.

This is also an optimization technique in Hive. hive.fetch.task.conversion property can (i.e. FETCH task) minimize latency of map-reduce overhead.

This is like we are reading a hadoop file : hadoop fs -cat filename

But if we use select colNames from tablename, it requires a map-reduce job as it needs to extract the 'column' from each row by parsing it from the file it loads.

0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...

热议问题