How does computing table stats in hive or impala speed up queries in Spark SQL?

前端 未结 3 1498
心在旅途
心在旅途 2020-12-28 22:48

For increasing performance (e.g. for joins) it is recommended to compute table statics first.

In Hive I can do::

analyze table  c         
3条回答
  •  醉话见心
    2020-12-28 23:24

    I am assuming you are using Hive on Spark (or) Spark-Sql with hive context. If that is the case, you should run analyze in hive.

    Analyze table<...> typically needs to run after the table is created or if there are significant inserts/changes. You can do this at the end of your load step itself, if this is a MR or spark job.

    At the time of analysis, if you are using hive on spark - please also use the configurations in the link below. You can set this at the session level for each query. I have used the parameters in this link https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started in production and it works fine.

提交回复
热议问题