Why I got incorrect calculation of COUNT DISTINCT with GROUP BY?

限于喜欢 提交于 2019-12-11 23:11:45

问题


I have a table INTERACTIONS

CustomerID | Channel | Response
-----------+---------+----------
 245       | SMS     | Accept   
 245       | PUSH    | Ignore   
 247       | SMS     | Accept   
 249       | PUSH    | Ignore   

When I make request

SELECT COUNT(DISTINCT CUSTOMERID) AS Customers 
FROM INTERACTIONS;

I get result 7440

When I make query with group by Channel, and then calculate sum for all groups:

    SELECT SUM(CUSTOMERS) 
    FROM 
        (SELECT 
             CHANNEL,
             COUNT(DISTINCT CUSTOMERID) AS Customers 
         FROM 
             INTERACTIONS
         GROUP BY 
             CHANNEL);

I get result 9993

Why? What's wrong? I expect that number of all customers is the same.


回答1:


It is right there in your sample data. The distinct customers are:

245, 247, 249

When you group by channel the 245 customer appears separately for PUSH and SMS:

SMS  | 245, 247
PUSH | 245, 249

Thus COUNT(DISTINCT x) GROUP BY y could be greater than COUNT(DISTINCT x) -- NO GROUP BY.




回答2:


SELECT CHANNEL,
COUNT(DISTINCT CUSTOMERID) AS Customers 
FROM INTERACTIONS
GROUP BY CHANNEL

That query gives you distinct CUSTOMERID per Channel. It is possible that same CUSTOMERID values exist among different Channels, thus they would be counted that many times in the final sum (9993).

You could check that out by converting the query to this one, that would give you the number of Channels per CUSTOMERID:

SELECT CUSTOMERID,
COUNT(DISTINCT CHANNEL) AS Channels
FROM INTERACTIONS
GROUP BY CHANNEL
HAVING COUNT(DISTINCT CHANNEL) > 1



回答3:


you got different result because different CHANNEL PUSH and SMS contains same id 245 , as a result when you COUNT(DISTINCT CUSTOMERID) in 1st query it will return 1 but when you applied group by CHANNEL it will return per group 1 so your 2nd query 245 id will make push=1 and sms=1 and final query sum() will make it 2 which is different result



来源:https://stackoverflow.com/questions/53556396/why-i-got-incorrect-calculation-of-count-distinct-with-group-by

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!