How does computing table stats in hive or impala speed up queries in Spark SQL?

前端 未结 3 1497
心在旅途
心在旅途 2020-12-28 22:48

For increasing performance (e.g. for joins) it is recommended to compute table statics first.

In Hive I can do::

analyze table  c         
3条回答
  •  星月不相逢
    2020-12-28 23:29

    From what i understand compute stats on impala is the latest implementation and frees you from tuning hive settings.

    From official doc:

    If you use the Hive-based methods of gathering statistics, see the Hive wiki for information about the required configuration on the Hive side. Cloudera recommends using the Impala COMPUTE STATS statement to avoid potential configuration and scalability issues with the statistics-gathering process.

    If you run the Hive statement ANALYZE TABLE COMPUTE STATISTICS FOR COLUMNS, Impala can only use the resulting column statistics if the table is unpartitioned. Impala cannot use Hive-generated column statistics for a partitioned table.

    Useful link: https://www.cloudera.com/documentation/enterprise/5-5-x/topics/impala_perf_stats.html

提交回复
热议问题