Selecting columns with DISTINCT in PostgreSQL

元气小坏坏 提交于 2019-12-21 01:18:16

问题


I'm querying bus stops from a database, and I wish to have it only return 1 stop per bus line/direction. This query does just that:

Stop.select("DISTINCT line_id, direction")

Except that it won't give me any other attribute than those 2. I tried a couple of other queries to have it return the id in addition to the line_id and direction fields (ideally it would return all columns), with no luck:

Stop.select("DISTINCT line_id, direction, id")

and

Stop.select("DISTINCT(line_id || '-' || direction), id")

In both cases, the query loses its distinct clause and all rows are returned.

Some awesome dude helped me out and suggested to use a subquery to have it return all the ids:

Stop.find_by_sql("SELECT DISTINCT a1.line_id, a1.direction, (SELECT a2.id from stops a2 where a2.line_id = a1.line_id AND a2.direction = a1.direction ORDER BY a2.id ASC LIMIT 1) as id FROM stops a1

I can then extract all the ids and perform a 2nd query to fetch the full attributes for each stop.

Is there a way to have it all inside 1 query AND have it return all the attributes?


回答1:


Stop.select("DISTINCT ON (line_id, direction) *")



回答2:


Not so fast - The other answer selects stop_id arbitrary

This is why your question makes no sense. We can pull stop_ids and have distinct line_id and direction. But we have no idea why we have the stop_id we do.

    create temp table test( line_id integer, direction char(1), stop_id      integer);
    insert into test values
            (1, 'N', 1),
            (1, 'N', 2),
            (1, 'S', 1),
            (1, 'S', 2),
            (2, 'N', 1),
            (2, 'N', 2),
            (2, 'S', 1),
            (2, 'S', 2)
    ;
    select distinct on (line_id, direction) * from test;
    -- do this again but will reverse the order of stop_ids
    -- could it possible change our Robust Query?!!!
    drop table test;
    create temp table test(line_id integer,direction char(1),stop_id integer);
    insert into test values
            (1, 'N', 2),
            (1, 'N', 1),
            (1, 'S', 2),
            (1, 'S', 1),
            (2, 'N', 2),
            (2, 'N', 1),
            (2, 'S', 2),
            (2, 'S', 1)
    ;
    select distinct on (line_id, direction) * from test;

First select:

line_id | direction | stop_id 
---------+-----------+---------
       1 | N         |       1
       1 | S         |       1
       2 | N         |       1
       2 | S         |       1

Second select:

line_id | direction | stop_id 
---------+-----------+---------
       1 | N         |       2
       1 | S         |       2
       2 | N         |       2
       2 | S         |       2

So we got away without grouping stop_id but we have no guarantees why we got the one we did. All we know is that this is valid stop_id. Any updates, inserts, and other stuff that no RDMS will guarantee can be changing around the physical order of rows.

This is what I meant in the top comment. There is no known reason for pulling one stop_id over the other one, but somehow you need this stop_id (or whatever else) desperately.



来源:https://stackoverflow.com/questions/5009297/selecting-columns-with-distinct-in-postgresql

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!