问题
I have a big query which also returns very big response. The query looks like this:
SELECT group, subgroup, max(last_update) FROM
(
SELECT a as group, a1 as subgroup, d1 as last_update FROM....
UNION ALL
SELECT b as group, b1 as subgroup, d2 as last_update FROM....
UNION ALL
SELECT c as group, c1 as subgroup, d3 as last_update FROM....
UNION ALL
SELECT d as group, d1 as subgroup, d3 as last_update FROM....
UNION ALL
SELECT e as group, e1 as subgroup, d4 as last_update FROM....
... and some more selects (15 select queries in total)
) GROUP BY group, subgroup;
As you can see I need to load maximum date from some groups. The problem is that those dates needs to be loaded from 15 selects and it works very slow (~4s). I tested that subselect
SELECT a as group, a1 as subgroup, d1 as last_update FROM....
UNION ALL
SELECT b as group, b1 as subgroup, d2 as last_update FROM....
UNION ALL
SELECT c as group, c1 as subgroup, d3 as last_update FROM....
UNION ALL
SELECT d as group, d1 as subgroup, d3 as last_update FROM....
UNION ALL
SELECT e as group, e1 as subgroup, d4 as last_update FROM....
... ans some more selects
works pretty (~0.1s) fast and the problem is with grouping function (thats why query works slowly):
SELECT group, subgroup, max(last_update) FROM
(
...
) GROUP BY group, subgroup;
Is there some way to improve this grouping? As I wrote the goal is to get maximum dates for each subgroup in group.
回答1:
I offer you take a look at parallel queries:
create table ttt as
with t(a, b, c, d, a1, b1, c1, d1, last_updated) as (
select 1, 2, 3, 4, 1, 2, 3, 4, sysdate + 1 from dual union all
select 1, 2, 3, 4, 1, 2, 3, 4, sysdate from dual union all
select 2, 3, 4, 5, 2, 3, 4, 5, sysdate + 2 from dual union all
select 2, 3, 4, 5, 2, 3, 4, 5, sysdate + 1 from dual union all
select 3, 4, 5, 6, 3, 4, 5, 6, sysdate + 3 from dual union all
select 3, 4, 5, 6, 3, 4, 5, 6, sysdate + 2 from dual union all
select 4, 5, 6, 7, 4, 5, 6, 7, sysdate + 4 from dual union all
select 4, 5, 6, 7, 4, 5, 6, 7, sysdate + 3 from dual
)
select * from t;
select a grp, a1 subgrp, max(last_updated)
from ttt
group by a, a1
Explain plan
Let's add some parallelism:
alter table ttt parallel;
select a grp, a1 subgrp, max(last_updated)
from ttt
group by a, a1
Explain plan
As you can see the cost cut down. But it is not for free, during a parallel query execution the query use all the resources you have, so it could damage your performance, but you said that this query was run not so often, I think this is a good solution. To read more about parallel query take a look at this
回答2:
Maybe do the group by in each individual subquery too?
select g, s, max(last_update) from (
select g, s, max(last_update) as last_update from t1 group by g, s
union all
select g, s, max(last_update) as last_update from t2 group by g, s
union all
...
)
group by g, s
I can't say for sure, but if the server is building a temporary rowset for the query then this might cut down the size of that temporary.
回答3:
That query looks syntactically incorrect to me:
SELECT group, subgroup, max(last_update) FROM
(
SELECT a as group, a1 as subgroup FROM....
You do a max on the LAST_UPDATE but it's not included in your subqueries?!
回答4:
In addition to Ed Avis answer we can further reduce the number of rows to group by the result set by using UNION instead of UNION ALL
select g, s, max(last_update) from (
select g, s, max(last_update) as last_update from t1 group by g, s
union
select g, s, max(last_update) as last_update from t2 group by g, s
union
...
)
group by g, s
来源:https://stackoverflow.com/questions/25260601/is-there-some-replacement-for-max-function-and-grouping-performance-optimizat