Why are the result of COUNT double when I do two join? [duplicate]

孤街浪徒 提交于 2021-01-27 06:07:13

问题


I have this tables

device

 id      name         groupId     serviceId
791   Mamie Ortega      205         1832

group

 id   serviceId
205     1832

record

 id          date                      deviceId
792   2017-07-13 13:30:19.740360         784
793   2017-07-13 13:30:19.742799         784

alarms

 id    status    deviceId
241      new        784
242      new        784 

I'm running this query

SELECT device.id, device.name, COUNT(records.id) AS "last24HMessagesCount", COUNT(alarms.id) AS "activeAlarmsCount"
FROM device
  INNER JOIN "group" AS "group" ON "device"."groupId" = "group"."id" AND "group"."id" = '205'
  LEFT OUTER JOIN "record" AS "records" ON "device"."id" = "records"."deviceId" AND "records"."date" > '2017-07-12 11:43:02.838 +00:00'
  LEFT OUTER JOIN "alarm" AS "alarms" ON "device"."id" = "alarms"."deviceId" AND "alarms"."status" = 'new'
WHERE "device"."serviceId" = 1832
GROUP BY device.id;

Which give me this result

 id      name       last24HMessagesCount      activeAlarmsCount   
791   Mamie Ortega         4                          4

This result is wrong, I'm supposed to have 2 for last24HMessagesCount and activeAlarmsCount.

If I remove one of the count, last24HMessagesCount for example and execute

SELECT device.id, device.name, COUNT(alarms.id) AS "activeAlarmsCount"
FROM device
  INNER JOIN "group" AS "group" ON "device"."groupId" = "group"."id" AND "group"."id" = '205'
  LEFT OUTER JOIN "alarm" AS "alarms" ON "device"."id" = "alarms"."deviceId" AND "alarms"."status" = 'new'
WHERE "device"."serviceId" = 1832
GROUP BY device.id;

The result is correct

 id      name       activeAlarmsCount   
791   Mamie Ortega         2

I do not understand, why are the counts double?


回答1:


This is very simple to answer. You have two record and two alarm. You join these and get four records, which you count.

You can workaround this problem by counting distinct IDs:

COUNT(DISTINCT records.id) AS "last24HMessagesCount",
COUNT(DISTINCT alarms.id) AS "activeAlarmsCount"

but I would not recommend this. Why do you join record and alarm anyway? They are not directly related. What you want to join is the number of record and the number of alarm. So aggregate before joining:

SELECT 
  device.id, 
  device.name, 
  records.cnt AS "last24HMessagesCount", 
  alarms.cnt AS "activeAlarmsCount"
FROM device
LEFT OUTER JOIN 
(
  SELECT deviceId, count(*) AS cnt
  FROM record
  WHERE "date" > '2017-07-12 11:43:02.838 +00:00'
  GROUP BY deviceId
) AS records ON device.id = records.deviceId
LEFT OUTER JOIN 
(
  SELECT deviceId, count(*) AS cnt
  FROM alarm
  WHERE status = 'new'
  GROUP BY deviceId
) AS alarms ON device.id = alarms.deviceId
WHERE device.serviceId = 1832
  AND device.groupId = 205;

(I've removed the unnecessary join to the "group" table.)




回答2:


Your joins are producing a Cartesian product along two dimensions. The simplest solution is to use COUNT(DISTINCT):

SELECT device.id, device.name,
       COUNT(DISTINCT records.id) AS "last24HMessagesCount",
       COUNT(DISTINCT alarms.id) AS "activeAlarmsCount"

This works if the counts are not very large. An alternative solution is more scalable. That is to do the aggregation before the LEFT JOINs or using correlated subqueries (or lateral joins).



来源:https://stackoverflow.com/questions/45080044/why-are-the-result-of-count-double-when-i-do-two-join

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!