问题
Here is my original query that I was using in postgres -
SELECT a.id,
(SELECT val
FROM database.detail x
WHERE name = 'blablah'
AND x.id = b.id) AS myGroup,
c.username,
a.someCode,
a.timeTaken,
a.date ::timestamp WITH time ZONE AT time ZONE 'PST' AS date,
SUM (CASE WHEN (b.name = 'name1') THEN b.val ::INTEGER ELSE 0 END ) AS name11,
SUM (CASE WHEN (b.name = 'name2') THEN b.val ::INTEGER ELSE 0 END ) AS name12
FROM
database.myTable a,
database.detail b,
database.client c
WHERE
a.id = b.id
AND a.c_id = c.c_id
AND a.date > current_date - interval '2 weeks'
GROUP BY 1, 2, 3, 4, 5, 6
Following is how I converted this query into Amazon redshift query.
SELECT a.id,
b.val AS myGroup,
c.username,
a.someCode,
a.timeTaken,
convert_timezone('PST', a.date) AS date,
SUM (CASE WHEN (b.name = 'name1') THEN b.val ::INTEGER ELSE 0 END ) AS name11,
SUM (CASE WHEN (b.name = 'name2') THEN b.val ::INTEGER ELSE 0 END ) AS name12
FROM
database.myTable a,
database.detail b,
database.client c
WHERE
a.id = b.id
AND b.name = 'blablah'
AND a.c_id = c.c_id
AND a.date > current_date - interval '2 weeks'
GROUP BY 1, 2, 3, 4, 5, 6 LIMIT 10
The CASE statement does not seem to be executing the way it is expected, basically the values for name11 and name12 are all zero. My postgres query returns valid values for these but the redshift query does not.
Also, this query is very very slow. Postgres query takes some 150 ms and this query takes 2 mins.
How can we do this better?
回答1:
Redshift Query optimization comes from Cluster, Table Design, DataLoading, Data Vacuuming &Analyzing over the table.
Let me answer some core touch points in the above list. 1. Make Sure your table mytable, detail, client has proper SORT_KEY, DIST_KEY 2. Make Sure all your tables in join are analzed and vaccumed properly.
Here is another version of your same SQL written in Redshift format.
Few Tweaks I made are
- Used "With Clause" to Optimized Cluster level computation
- Used Joins the proper way and make sure left/right join matters based on data.
- Used date_range with clause table for kind of object orientation.
- Used Group By in the main SQL below.
My Version of Redshift SQL
/** Date Range Computation **/
with date_range as (
select ( current_Date - interval '2 weeks' ) as two_weeks
),
/** Filter main ResultSet**/
myGroupSet as (
SELECT b.val AS myGroup,
c.username,
a.someCode,
a.timeTaken,
(case when (b.name == 'name1') THEN b.val::INTEGER ELSE 0 END ) as name11,
(case when (b.name == 'name2') THEN b.val::INTEGER ELSE 0 END ) as name12
FROM database.myTable a,
join date_range dr on a.date > dr.two_weeks
join database.detail b on b.id = a.id
join database.client c on c.c_id = a.c_id
where a.date > current_Date - interval '2 weeks'
)
/** Apply Aggregation **/
select myGroup, username, someCode, timeTaken, date,
sum(name1), sum(name2)
from myGroupSet
group by myGroup, username, someCode, timeTaken, date
来源:https://stackoverflow.com/questions/38231015/how-can-i-write-this-postgres-query-in-amazon-redshift-such-that-it-is-as-optimi