Is there some replacement for 'max' function and grouping (performance optimization of aggregate operations)?

问题

I have a big query which also returns very big response. The query looks like this:

SELECT group, subgroup, max(last_update) FROM
(
    SELECT a as group, a1 as subgroup, d1 as last_update FROM....
    UNION ALL
    SELECT b as group, b1 as subgroup, d2 as last_update FROM....
    UNION ALL
    SELECT c as group, c1 as subgroup, d3 as last_update FROM....
    UNION ALL
    SELECT d as group, d1 as subgroup, d3 as last_update FROM....
    UNION ALL
    SELECT e as group, e1 as subgroup, d4 as last_update FROM....
    ... and some more selects (15 select queries in total)
) GROUP BY group, subgroup;

As you can see I need to load maximum date from some groups. The problem is that those dates needs to be loaded from 15 selects and it works very slow (~4s). I tested that subselect

SELECT a as group, a1 as subgroup, d1 as last_update FROM....
UNION ALL
SELECT b as group, b1 as subgroup, d2 as last_update FROM....
UNION ALL
SELECT c as group, c1 as subgroup, d3 as last_update FROM....
UNION ALL
SELECT d as group, d1 as subgroup, d3 as last_update FROM....
UNION ALL
SELECT e as group, e1 as subgroup, d4 as last_update FROM....
... ans some more selects

works pretty (~0.1s) fast and the problem is with grouping function (thats why query works slowly):

SELECT group, subgroup, max(last_update) FROM
(
    ...
) GROUP BY group, subgroup;

Is there some way to improve this grouping? As I wrote the goal is to get maximum dates for each subgroup in group.

回答1:

I offer you take a look at parallel queries:

create table ttt as
with t(a, b, c, d, a1, b1, c1, d1, last_updated) as (
  select 1, 2, 3, 4, 1, 2, 3, 4, sysdate + 1 from dual union all
  select 1, 2, 3, 4, 1, 2, 3, 4, sysdate from dual union all
  select 2, 3, 4, 5, 2, 3, 4, 5, sysdate + 2 from dual union all
  select 2, 3, 4, 5, 2, 3, 4, 5, sysdate + 1 from dual union all
  select 3, 4, 5, 6, 3, 4, 5, 6, sysdate + 3 from dual union all
  select 3, 4, 5, 6, 3, 4, 5, 6, sysdate + 2 from dual union all
  select 4, 5, 6, 7, 4, 5, 6, 7, sysdate + 4 from dual union all
  select 4, 5, 6, 7, 4, 5, 6, 7, sysdate + 3 from dual 
)
select * from t;

select a grp, a1 subgrp, max(last_updated)
  from ttt
 group by a, a1

Explain plan

Let's add some parallelism:

alter table ttt parallel;

select a grp, a1 subgrp, max(last_updated)
  from ttt
 group by a, a1

Explain plan

As you can see the cost cut down. But it is not for free, during a parallel query execution the query use all the resources you have, so it could damage your performance, but you said that this query was run not so often, I think this is a good solution. To read more about parallel query take a look at this

回答2:

Maybe do the group by in each individual subquery too?

select g, s, max(last_update) from (
  select g, s, max(last_update) as last_update from t1 group by g, s
  union all
  select g, s, max(last_update) as last_update from t2 group by g, s
  union all
  ...
)
group by g, s

I can't say for sure, but if the server is building a temporary rowset for the query then this might cut down the size of that temporary.

回答3:

That query looks syntactically incorrect to me:

SELECT group, subgroup, max(last_update) FROM
(
    SELECT a as group, a1 as subgroup FROM....

You do a max on the LAST_UPDATE but it's not included in your subqueries?!

回答4:

In addition to Ed Avis answer we can further reduce the number of rows to group by the result set by using UNION instead of UNION ALL

select g, s, max(last_update) from (
select g, s, max(last_update) as last_update from t1 group by g, s
union
select g, s, max(last_update) as last_update from t2 group by g, s
union
...
)
group by g, s

来源：https://stackoverflow.com/questions/25260601/is-there-some-replacement-for-max-function-and-grouping-performance-optimizat

标签

sql

performance

Oracle

oracle11g