Left outer join acting like inner join

跟風遠走 提交于 2019-12-02 00:41:41

The query can probably be simplified to:

SELECT u.name AS user_name
     , p.name AS project_name
     , tl.created_on::date AS changeday
     , coalesce(sum(nullif(new_value, '')::numeric), 0)
     - coalesce(sum(nullif(old_value, '')::numeric), 0) AS hours
FROM   users             u
LEFT   JOIN (
        tasks            t 
   JOIN fixins           f  ON  f.id = t.fixin_id
   JOIN projects         p  ON  p.id = f.project_id
   JOIN task_log_entries tl ON  tl.task_id = t.id
                           AND  tl.field_id = 18
                           AND (tl.created_on IS NULL OR
                                tl.created_on >= '2013-09-08' AND
                                tl.created_on <  '2013-09-09') -- upper border!
       ) ON t.assignee_id = u.id
WHERE  EXISTS (SELECT 1 FROM tasks t1 WHERE t1.assignee_id = u.id)
GROUP  BY 1, 2, 3
ORDER  BY 1, 2, 3;

This returns all users that have ever had any task.
Plus data per projects and day where data exists in the specified date range in task_log_entries.

Major points

  • The aggregate function sum() ignores NULL values. COALESCE() per row is not required any more as soon as you recast the calculation as the difference of two sums:

     ,coalesce(sum(nullif(new_value, '')::numeric), 0) -
      coalesce(sum(nullif(old_value, '')::numeric), 0) AS hours
    

    However, if it is possible that all columns of a selection have NULL or empty strings, wrap the sums into COALESCE once.
    I am using numeric instead of float, safer alternative to minimize rounding errors.

  • Your attempt to get distinct values from the join of users and tasks is futile, since you join to task once more further down. Flatten the whole query to make it simpler and faster.

  • These positional references are just a notational convenience:

    GROUP BY 1, 2, 3
    ORDER BY 1, 2, 3
    

    ... doing the same as in your original query.

  • To get a date from a timestamp you can simply cast to date:

    tl.created_on::date AS changeday
    

    But it's much better to test with original values in the WHERE clause or JOIN condition (if possible, and it is possible here), so Postgres can use plain indices on the column (if available):

     AND (tl.created_on IS NULL OR
          tl.created_on >= '2013-09-08' AND
          tl.created_on <  '2013-09-09')  -- next day as excluded upper border
    

    Note that a date literal is converted to a timestamp at 00:00 of the day at your current time zone. You need to pick the next day and exclude it as upper border. Or provide a more explicit timestamp literal like '2013-09-22 0:0 +2':: timestamptz. More on excluding upper border:

  • For the requirement every user who has ever been assigned to a task add the WHERE clause:

    WHERE EXISTS (SELECT 1 FROM tasks t1 WHERE t1.assignee_id = u.id)
    
  • Most importantly: A LEFT [OUTER] JOIN preserves all rows to the left of the join. Adding a WHERE clause on the right table can void this effect. Instead, move the filter expression to the JOIN clause. More explanation here:

  • Parentheses can be used to force the order in which tables are joined. Rarely needed for simple queries, but very useful in this case. I use the feature to join task, fixins, projects and task_log_entries before left-joining all of it to users - without subquery.

  • Table aliases make writing complex queries easier.

It doesn't work because the first query is inner joined with tasks. The same table is than used to perform outer join (through subquery but nevertheless) but the first query (tasked users) doesn't have the relevant records in the first place (that lack the match).

Try using

....
FROM (
  SELECT DISTINCT
    users.id,
    users.name AS user_name
  FROM users    
) tasked_users
...
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!