Does “group by” automatically guarantee “order by”?

烂漫一生 提交于 2019-12-04 01:51:04
juergen d

group by does not order the data neccessarily. A DB is designed to grab the data as fast as possible and only sort if necessary.

So add the order by if you need a guaranteed order.

An efficient implementation of group by would perform the group-ing by sorting the data internally. That's why some RDBMS return sorted output when group-ing. Yet, the SQL specs don't mandate that behavior, so unless explicitly documented by the RDBMS vendor I wouldn't bet on it to work (tomorrow). OTOH, if the RDBMS implicitly does a sort it might also be smart enough to then optimize (away) the redundant order by. @jimmyb

An example using PostgreSQL proving that concept

Creating a table with 1M records, with random dates in a day range from today - 90 and indexing by date

CREATE TABLE WITHDRAW AS
  SELECT (random()*1000000)::integer AS IDT_WITHDRAW,
    md5(random()::text) AS NAM_PERSON,
    (NOW() - ( random() * (NOW() + '90 days' - NOW()) ))::timestamp AS DAT_CREATION, -- de hoje a 90 dias atras
    (random() * 1000)::decimal(12, 2) AS NUM_VALUE
  FROM generate_series(1,1000000);

CREATE INDEX WITHDRAW_DAT_CREATION ON WITHDRAW(DAT_CREATION);

Grouping by date truncated by day of month, restricting select by dates in a two days range

EXPLAIN 
SELECT
    DATE_TRUNC('DAY', W.dat_creation), COUNT(1), SUM(W.NUM_VALUE)
FROM WITHDRAW W
WHERE W.dat_creation >= (NOW() - INTERVAL '2 DAY')::timestamp
AND W.dat_creation < (NOW() - INTERVAL '1 DAY')::timestamp
GROUP BY 1

HashAggregate  (cost=11428.33..11594.13 rows=11053 width=48)
  Group Key: date_trunc('DAY'::text, dat_creation)
  ->  Bitmap Heap Scan on withdraw w  (cost=237.73..11345.44 rows=11053 width=14)
        Recheck Cond: ((dat_creation >= ((now() - '2 days'::interval))::timestamp without time zone) AND (dat_creation < ((now() - '1 day'::interval))::timestamp without time zone))
        ->  Bitmap Index Scan on withdraw_dat_creation  (cost=0.00..234.97 rows=11053 width=0)
              Index Cond: ((dat_creation >= ((now() - '2 days'::interval))::timestamp without time zone) AND (dat_creation < ((now() - '1 day'::interval))::timestamp without time zone))

Using a larger restriction date range, it chooses to apply a SORT

EXPLAIN 
SELECT
    DATE_TRUNC('DAY', W.dat_creation), COUNT(1), SUM(W.NUM_VALUE)
FROM WITHDRAW W
WHERE W.dat_creation >= (NOW() - INTERVAL '60 DAY')::timestamp
AND W.dat_creation < (NOW() - INTERVAL '1 DAY')::timestamp
GROUP BY 1

GroupAggregate  (cost=116522.65..132918.32 rows=655827 width=48)
  Group Key: (date_trunc('DAY'::text, dat_creation))
  ->  Sort  (cost=116522.65..118162.22 rows=655827 width=14)
        Sort Key: (date_trunc('DAY'::text, dat_creation))
        ->  Seq Scan on withdraw w  (cost=0.00..41949.57 rows=655827 width=14)
              Filter: ((dat_creation >= ((now() - '60 days'::interval))::timestamp without time zone) AND (dat_creation < ((now() - '1 day'::interval))::timestamp without time zone))

Just by adding ORDER BY 1 at the end (there is no significant difference)

GroupAggregate  (cost=116522.44..132918.06 rows=655825 width=48)
  Group Key: (date_trunc('DAY'::text, dat_creation))
  ->  Sort  (cost=116522.44..118162.00 rows=655825 width=14)
        Sort Key: (date_trunc('DAY'::text, dat_creation))
        ->  Seq Scan on withdraw w  (cost=0.00..41949.56 rows=655825 width=14)
              Filter: ((dat_creation >= ((now() - '60 days'::interval))::timestamp without time zone) AND (dat_creation < ((now() - '1 day'::interval))::timestamp without time zone))

PostgreSQL 10.3

It definitely doesn't. I have experienced that, once one of my queries suddenly started to return not-ordered results, as the data in the table grows by.

I tried it. Adventureworks db of Msdn.

select HireDate, min(JobTitle)
from AdventureWorks2016CTP3.HumanResources.Employee
group by HireDate

Resuts :

2009-01-10Production Technician - WC40

2009-01-11Application Specialist

2009-01-12Assistant to the Chief Financial Officer

2009-01-13Production Technician - WC50<

It returns sorted data of hiredate, but you don't rely on GROUP BY to SORT under any circumstances.

for example; indexes can change this sorted data.

I added following index (hiredate, jobtitle)

CREATE NONCLUSTERED INDEX NonClusturedIndex_Jobtitle_hireddate ON [HumanResources].[Employee]
(
    [JobTitle] ASC,
    [HireDate] ASC
)

Result will change with same select query;

2006-06-30 Production Technician - WC60

2007-01-26 Marketing Assistant

2007-11-11 Engineering Manager

2007-12-05 Senior Tool Designer

2007-12-11 Tool Designer

2007-12-20 Marketing Manager

2007-12-26 Production Supervisor - WC60

You can download Adventureworks2016 at the following address

https://www.microsoft.com/en-us/download/details.aspx?id=49502

It depends on the number of records. When the records are less, Group by sorted automatically. When the records are more(more than 15) it required adding Order by clause

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!