问题
I was trying to understand PARTITION BY in postgres by writing a few sample queries. I have a test table on which I run my query.
id integer | num integer
___________|_____________
1 | 4
2 | 4
3 | 5
4 | 6
When I run the following query, I get the output as I expected.
SELECT id, COUNT(id) OVER(PARTITION BY num) from test;
id | count
___________|_____________
1 | 2
2 | 2
3 | 1
4 | 1
But, when I add ORDER BY to the partition,
SELECT id, COUNT(id) OVER(PARTITION BY num ORDER BY id) from test;
id | count
___________|_____________
1 | 1
2 | 2
3 | 1
4 | 1
My understanding is that COUNT is computed across all rows that fall into a partition. Here, I have partitioned the rows by num. The number of rows in the partition is the same, with or without an ORDER BY clause. Why is there a difference in the outputs?
回答1:
When you add an order by
to an aggregate used as a window function that aggregate turns into a "running count" (or whatever aggregate you use).
The count(*)
will return the number of rows up until the "current one" based on the order specified.
The following query shows the different results for aggregates used with an order by
. With sum()
instead of count()
it's a bit easier to see (in my opinion).
with test (id, num, x) as (
values
(1, 4, 1),
(2, 4, 1),
(3, 5, 2),
(4, 6, 2)
)
select id,
num,
x,
count(*) over () as total_rows,
count(*) over (order by id) as rows_upto,
count(*) over (partition by x order by id) as rows_per_x,
sum(num) over (partition by x) as total_for_x,
sum(num) over (order by id) as sum_upto,
sum(num) over (partition by x order by id) as sum_for_x_upto
from test;
will result in:
id | num | x | total_rows | rows_upto | rows_per_x | total_for_x | sum_upto | sum_for_x_upto
---+-----+---+------------+-----------+------------+-------------+----------+---------------
1 | 4 | 1 | 4 | 1 | 1 | 8 | 4 | 4
2 | 4 | 1 | 4 | 2 | 2 | 8 | 8 | 8
3 | 5 | 2 | 4 | 3 | 1 | 11 | 13 | 5
4 | 6 | 2 | 4 | 4 | 2 | 11 | 19 | 11
There are more examples in the Postgres manual
回答2:
Your two expressions are:
COUNT(id) OVER (PARTITION BY num)
COUNT(id) OVER (PARTITION BY num ORDER BY id)
Why would you expect these to return the same values? The syntax is different for a reason.
The first returns the overall count for each num
-- essentially joining back the aggregated value.
The second does a cumulative count. It does the COUNT()
for each row of id
, for all values up to that id
s value.
Note that such cumulative counts would normally be implemented using RANK()
(or related functions).
The cumulative count is subtly different from RANK()
. The cumulative count implements:
COUNT(id) OVER (PARTITION BY num ORDER BY id RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
RANK()
is slightly different. The difference only matters when the ORDER BY
keys have ties.
来源:https://stackoverflow.com/questions/50945477/count-rows-in-partition-with-order-by