Why is Select Count(*) slower than Select * in hive

前端 未结 3 479
Happy的楠姐
Happy的楠姐 2020-12-28 16:12

When i am running queries in VirtualBox Sandbox with hive. I feel Select count(*) is too much slower than the Select *.

Can an

3条回答
  •  春和景丽
    2020-12-28 16:57

    There are three types of operations that a hive query can perform.

    In order of cheapest and fastest to more expensive and slower here they are.

    A hive query can be a metadata only request.

    Show tables, describe table are examples. In these queries the hive process performs a lookup in the metadata server. The metadata server is a SQL database, probably MySQL, but the actual DB is configurable.

    A hive query can be an hdfs get request. Select * from table, would be an example. In this case hive can return the results by performing an hdfs operation. hadoop fs -get, more or less.

    A hive query can be a Map Reduce job.

    Hive has to ship the jar to hdfs, the jobtracker queues the tasks, the tasktracker execute the tasks, the final data is put into hdfs or shipped to the client.

    The Map Reduce job has different possibilities as well.

    It can be a Map only job. Select * from table where id > 100 , for example all of that logic can be applied on the mapper.

    It can be a Map and Reduce job, Select min(id) from table; Select * from table order by id ;

    It can also lead to multiple map Reduce passes, but I think the above summarizes some behaviors.

提交回复
热议问题