distinct() function (not select qualifier) in postgres

三世轮回 提交于 2019-12-23 07:58:18

问题


I just came across a SQL query, specifically against a Postgres database, that uses a function named "distinct". Namely:

select distinct(pattern) as pattern, style, ... etc ...
from styleview
where ... etc ...

Note this is NOT the ordinary DISTINCT qualifier on a SELECT -- at least it's not the normal syntax for the DISTINCT qualifier, note the parentheses. It is apparently using DISTINCT as a function, or maybe this is some special syntax.

Any idea what this means?

I tried playing with it a little and if I write

select distinct(foo)
from bar

I get the same results as

select distinct foo
from bar

When I combine it with other fields in the same select, it's not clear to me exactly what it's doing.

I can't find anything in the Postgres documentation.

Thanks for any help!


回答1:


(The question is old, but comes high in Google results for “sql distinct is not a function” (second, first of Stack Overflow) and yet is still missing a satisfying answer, so...)

Actually this is the ordinary DISTINCT qualifier on a SELECT -- but with a misleading syntax (you are right about that point).

DISTINCT is never a function, always a keyword. Here it is used (wrongly) as if it were a function, but

select distinct(pattern) as pattern, style, ... etc ...
from styleview
where ... etc ...

is in fact equivalent to all the following forms:

-- add a space after distinct:

select distinct (pattern) as pattern, style, ... etc ...
from styleview
where ... etc ...

-- remove parentheses around column name:

select distinct pattern as pattern, style, ... etc ...
from styleview
where ... etc ...

-- indent clauses contents:

select distinct
    pattern as pattern, style, ... etc ...
from
    styleview
where
    ... etc ...

-- remove redundant alias identical to column name:

select distinct
    pattern, style, ... etc ...
from
    styleview
where
    ... etc ...

Complementary reading:

  • http://weblogs.sqlteam.com/jeffs/archive/2007/10/12/sql-distinct-group-by.aspx
  • https://stackoverflow.com/a/1164529

Note: OMG Ponies in an answer to the present question mentioned the DISTINCT ON extension featured by PostgreSQL.
But (as Jay rightly remarked in a comment) it is not what is used here, because the query (and the results) would have been different, e.g.:

select distinct on(pattern) pattern, style, ... etc ...
from styleview
where ... etc ...
order by pattern, ... etc ...

equivalent to:

select  distinct on (pattern)
    pattern, style, ... etc ...
from
    styleview
where
    ... etc ...
order by
    pattern, ... etc ...

Complementary reading:

  • http://www.noelherrick.com/blog/postgres-distinct-on

Note: Lukas Eder in an answer to the present question mentioned the syntax of using the DISTINCT keyword inside an aggregate function:
the COUNT(DISTINCT (foo, bar, ...)) syntax featured by HSQLDB
(or COUNT(DISTINCT foo, bar, ...) which works for MySQL too, but also for PostgreSQL, SQL Server, Oracle, and maybe others).
But (clearly enough) it is not what is used here.




回答2:


From the documentation:

If DISTINCT is specified, all duplicate rows are removed from the result set (one row is kept from each group of duplicates). ALL specifies the opposite: all rows are kept; that is the default.

DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY (see above). Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first. For example,

The ON portion is optional, so it really comes down to:

  1. The brackets being used
  2. Placement in the query - SQL Server & MySQL throw an error if you use DISTINCT in any but the first position of the SELECT clause

PostgreSQL is the only database to my knowledge to support this syntax.




回答3:


It's either a typo or someone misunderstood what they were writing.

I don't know all the details, but you can use parentheses as precedence operators (just like in math). However, I think it ends up that you can put parentheses around a lot of things without actually changing their meanings.

For example, the following 2 queries return exactly the same thing:

select foo
from bar

select (foo)
from bar

It's confusing because the you can also use parentheses to group columns into records, for example:

select (foo, baz)
from bar

So in your original query, what they've actually written would be equivalent to this:

select distinct *
from
(
    select pattern as pattern, style, ... etc ...
    from styleview
    where ... etc ...
)

which may or may not be what they intended. If I had to guess I would guess they were going for the "DISTINCT ON(...)" syntax mentioned in some of the other answers.




回答4:


From the PostgreSQL documentation:

SELECT [ ALL | DISTINCT [ ON ( expression [, ...] ) ] ]
    [ * | expression [ [ AS ] output_name ] [, ...] ]

In the first line of that quoted syntax you will find that the ON portion is optional, but it is also that ON portion which references parentheses. In other words, unless the ON is present then the parentheses are meaningless.

So, for this question the [ ON ( expression [, ...] ) ] is not relevant.

Here is some very simple test data:

CREATE TABLE bar
    (foo varchar(3), fub varchar(1), flut timestamp)
;

INSERT INTO bar
    (foo, fub, flut)
VALUES
    ('one', 'a', '2016-01-01 01:01:03'),
    ('one', 'b', '2016-01-01 01:01:02'),
    ('one', 'c', '2016-01-01 01:01:01'),
    ('two', 'd', '2016-01-01 01:01:03'),
    ('two', 'e', '2016-01-01 01:01:02'),
    ('two', 'f', '2016-01-01 01:01:01')
;

Let us first concentrate on the parentheses. What do parentheses alone do around an expression following select? e.g.

select (foo) from bar;

| foo |
|-----|
| one |
| one |
| one |
| two |
| two |
| two |

I trust that you will see that this result is identical to a query without parentheses surrounding column foo, and so what we find from that query is that the parentheses do NOTHING. They are simply ignored. What happens however if we introduce DISTINCT?

select distinct(foo) from bar;

| foo |
|-----|
| two |
| one |

select distinct foo from bar;

| foo |
|-----|
| two |
| one |

Again, we see that the parentheses have no effect at all. If we refer back to the syntax this is consistent. DISTINCT is NOT a FUNCTION and placing an expression inside parentheses after DISTINCT does not alter the way it works.

So, for the question:

just came across a SQL query, specifically against a Postgres database, that uses a function named "distinct". Namely:

select distinct(pattern) as pattern, style, ... etc ...
from styleview
where ... etc ...

DISTINCT is NOT a function! and the parentheses in that example query are ignored.



If used the optional [ ON (expression) ] really does alter results.

Test a:

select distinct ON (foo) foo, fub, flut from bar order by foo

| foo | fub |                      flut |
|-----|-----|---------------------------|
| one |   a | January, 01 2016 01:01:03 |
| two |   d | January, 01 2016 01:01:03 |

Test b:

select distinct ON (fub) foo, fub, flut from bar order by fub

| foo | fub |                      flut |
|-----|-----|---------------------------|
| one |   a | January, 01 2016 01:01:03 |
| one |   b | January, 01 2016 01:01:02 |
| one |   c | January, 01 2016 01:01:01 |
| two |   d | January, 01 2016 01:01:03 |
| two |   e | January, 01 2016 01:01:02 |
| two |   f | January, 01 2016 01:01:01 |

Test c:

select distinct ON (flut) foo, fub, flut from bar order by flut

| foo | fub |                      flut |
|-----|-----|---------------------------|
| one |   c | January, 01 2016 01:01:01 |
| one |   b | January, 01 2016 01:01:02 |
| one |   a | January, 01 2016 01:01:03 |

The [ ON (expression) ] facility is very useful as it can provide the "first", or "last", or "earliest", or "most recent" rows in a distinct list. But keep in mind that this capability is coupled with the ORDER BY clause and in fact unless the order by clause ALSO refers to expressions used by in the SELECT DISTINCT ON PostgreSQL produces an error:

ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions

The examples above may be seen operating at sqlfiddle here


While I don't wish to over complicate my answer there is a wrinkle worth mentioning:

select distinct (foo,fub) from bar;

NOW the parentheses do something, but what they do has no direct relationship to distinct. See "complex types"



来源:https://stackoverflow.com/questions/3408037/distinct-function-not-select-qualifier-in-postgres

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!