HIVE select count(*) non null returns higher value than select count(*)

女生的网名这么多〃 提交于 2019-11-26 05:56:44

问题


I am currently doing some data exploration with Hive and cannot explain the following behavior. Say I have a table (named mytable) with a field master_id.

When I count the number of row I get

select count(*) as c from mytable 
c
1129563

If I want to count the number of row with a non null master_id, I get a higher number

select count(*) as c from mytable where master_id is not null
c
1134041

Additionally, the master_id seems to be never null.

select count(*) as c from mytable where master_id is null
c
0

I cannot explain how adding a where statement can increase the number of rows eventually. Does anyone have any hint to explain this behavior ?

Thanks


回答1:


Most probably your query without where is using statistics because of this parameter is set:

set hive.compute.query.using.stats=true;

Try to set it false and execute again.

Alternatively you can compute statistics on the table. See ANALYZE TABLE SYNTAX

Also it's possible to gather statistics during INSERT OVERWRITE automatically:

set hive.stats.autogather=true;


来源:https://stackoverflow.com/questions/39912784/hive-select-count-non-null-returns-higher-value-than-select-count

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!