Postgres LEFT JOIN with SUM, missing records

倾然丶 夕夏残阳落幕 提交于 2019-12-13 13:32:02

问题


I am trying to get the count of certain types of records in a related table. I am using a left join.

So I have a query that isn't quite right and one that is returning the correct results. The correct results query has a higher execution cost. Id like to use the first approach, if I can correct the results. (see http://sqlfiddle.com/#!15/7c20b/5/2)

CREATE TABLE people(
  id SERIAL,
  name varchar not null
);

CREATE TABLE pets(
  id SERIAL,
  name varchar not null, 
  kind varchar not null,
  alive boolean not null default false,
  person_id integer not null
);

INSERT INTO people(name) VALUES
('Chad'),
('Buck'); --can't keep pets alive

INSERT INTO pets(name, alive, kind, person_id) VALUES
('doggio', true, 'dog', 1),
('dog master flash', true, 'dog', 1),
('catio', true, 'cat', 1),
('lucky', false, 'cat', 2);

My goal is to get a table back with ALL of the people and the counts of the KINDS of pets they have alive:

| ID | ALIVE_DOGS_COUNT | ALIVE_CATS_COUNT |
|----|------------------|------------------|
|  1 |                2 |                1 |
|  2 |                0 |                0 |

I made the example more trivial. In our production app (not really pets) there would be about 100,000 dead dogs and cats per person. Pretty screwed up I know, but this example is simpler to relay ;) I was hoping to filter all the 'dead' stuff out before the count. I have the slower query in production now (from sqlfiddle above), but would love to get the LEFT JOIN version working.


回答1:


Typically fastest if you fetch all or most rows:

SELECT pp.id
     , COALESCE(pt.a_dog_ct, 0) AS alive_dogs_count
     , COALESCE(pt.a_cat_ct, 0) AS alive_cats_count
FROM   people pp
LEFT   JOIN (
   SELECT person_id
        , count(kind = 'dog' OR NULL) AS a_dog_ct
        , count(kind = 'cat' OR NULL) AS a_cat_ct
   FROM   pets
   WHERE  alive
   GROUP  BY 1
   ) pt ON pt.person_id = pp.id;

Indexes are irrelevant here, full table scans will be fastest. Except if alive pets are a rare case, then a partial index should help. Like:

CREATE INDEX pets_alive_idx ON pets (person_id, kind) WHERE alive;

I included all columns needed for the query (person_id, kind) to allow index-only scans.

SQL Fiddle.

Typically fastest for a small subset or a single row:

SELECT pp.id
     , count(kind = 'dog' OR NULL) AS alive_dogs_count
     , count(kind = 'cat' OR NULL) AS alive_cats_count
FROM   people pp
LEFT   JOIN pets pt ON pt.person_id = pp.id
                   AND pt.alive
WHERE  <some condition to retrieve a small subset>
GROUP  BY 1;

You should at least have an index on pets.person_id for this (or the partial index from above) - and possibly more, depending ion the WHERE condition.

Related answers:

  • Query with LEFT JOIN not returning rows for count of 0
  • GROUP or DISTINCT after JOIN returns duplicates
  • Get count of foreign key from multiple tables



回答2:


Your WHERE alive=true is actually filtering out record for person_id = 2. Use the below query, push the WHERE alive=true condition into the CASE condition as can be noticed here. See your modified Fiddle

SELECT people.id,
pe.alive_dogs_count,
pe.alive_cats_count
FROM people
LEFT JOIN 
(
select person_id, 
  COALESCE(SUM(case when pets.kind='dog' and alive = true then 1 else 0 end),0) as alive_dogs_count,
  COALESCE(SUM(case when pets.kind='cat' and alive = true then 1 else 0 end),0) as alive_cats_count
from pets
GROUP BY person_id
) pe on people.id = pe.person_id

(OR) your version

SELECT 
  people.id,
  COALESCE(SUM(case when pets.kind='dog' and alive = true then 1 else 0 end),0) as alive_dogs_count,
  COALESCE(SUM(case when pets.kind='cat' and alive = true then 1 else 0 end),0) as alive_cats_count
FROM people
  LEFT JOIN pets on people.id = pets.person_id
GROUP BY people.id;


来源:https://stackoverflow.com/questions/26434262/postgres-left-join-with-sum-missing-records

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!