I'm preforming an aggregate function on multiple records, which are grouped by a common ID. The problem is, I also want to export some other fields, which might be different within the grouped records, but I want to get those certain fields from one of the records (the first one, according to the query's ORDER BY).
Starting point example:
SELECT
customer_id,
sum(order_total),
referral_code
FROM order
GROUP BY customer_id
ORDER BY date_created
I need to query the referral code, but doing it outside of an aggregate function means I have to group by that field as well, and that's not what I want - I need exactly one row per customer in this example. I really only care about the referral code from the first order, and I'm happy to throw out any later referral codes.
This is in PostgreSQL, but maybe syntax from other DBs could be similar enough to work.
Rejected solutions:
- Can't use max() or min() because order is significant.
- A subquery might work at first, but does not scale; this is an extremely reduced example. My actual query has dozens of fields like referral_code which I only want the first instance of, and dozens of WHERE clauses which, if duplicated in a subquery, would make for a maintenance nightmare.
Well, it's actually pretty simple.
First, let's write a query that will do the aggregation:
select customer_id, sum(order_total)
from order
group by customer_id
now, let's write a query that would return 1st referral_code and date_created for given customer_id:
select distinct on (customer_id) customer_id, date_created, referral_code
from order
order by customer_id, date_created
Now, you can simply join the 2 selects:
select
x1.customer_id,
x1.sum,
x2.date_created,
x2.referral_code
from
(
select customer_id, sum(order_total)
from order
group by customer_id
) as x1
join
(
select distinct on (customer_id) customer_id, date_Created, referral_code
from order
order by customer_id, date_created
) as x2 using ( customer_id )
order by x2.date_created
I didn't test it, so there could be typos in it, but generally it should work.
You will need window functions. It's kind of GROUP BY, but you can still access the individual rows. Only used the Oracle equivalent though.
If the date_created is guaranteed to be unique per customer_id, then you can do this:
[simple table]
create table ordertable (customer_id int, order_total int, referral_code char, date_created datetime)
insert ordertable values (1,10, 'a', '2009-01-01')
insert ordertable values (2,15, 'b', '2009-01-02')
insert ordertable values (1,35, 'c', '2009-01-03')
[replace my lame table names with something better :)]
SELECT
orderAgg.customer_id,
orderAgg.order_sum,
referral.referral_code as first_referral_code
FROM (
SELECT
customer_id,
sum(order_total) as order_sum
FROM ordertable
GROUP BY customer_id
) as orderAgg join (
SELECT
customer_id,
min(date_created) as first_date
FROM ordertable
GROUP BY customer_id
) as dateAgg on orderAgg.customer_id = dateAgg.customer_id
join ordertable as referral
on dateAgg.customer_id = referral.customer_id
and dateAgg.first_date = referral.date_created
Perhaps something like:
SELECT
O1.customer_id,
O1.referral_code,
SQ.total
FROM
Orders O1
LEFT OUTER JOIN Orders O2 ON
O2.customer_id = O1.customer_id AND
O2.date_created < O1.date_created
INNER JOIN (
SELECT
customer_id,
SUM(order_total) AS total
FROM
Orders
GROUP BY
customer_id
) SQ ON SQ.customer_id = O1.customer_id
WHERE
O2.customer_id IS NULL
Would something like this do the trick?
SELECT
customer_id,
sum(order_total),
(SELECT referral_code
FROM order o
WHERE o.customer_id = order.customer_id
ORDER BY date_created
LIMIT 1) AS customers_referral_code
FROM order
GROUP BY customer_id, customers_referral_code
ORDER BY date_created
This doesn't require you to maintain the WHERE clause in two places and maintains the order significance, but would get pretty hairy if you needed "dozens of fields" like referral_code. It's also fairly slow (at least on MySQL).
It sounds to me like referral_code and the dozens of fields like it should be in the customer table, not the order table, since they're logically associated 1:1 with the customer, not the order. Moving them there would make the query MUCH simpler.
This might also do the trick:
SELECT
o.customer_id,
sum(o.order_total),
c.referral_code, c.x, c.y, c.z
FROM order o LEFT JOIN (
SELECT referral_code, x, y, z
FROM orders c
WHERE c.customer_id = o.customer_id
ORDER BY c.date_created
LIMIT 1
) AS c
GROUP BY o.customer_id, c.referral_code
ORDER BY o.date_created
SELECT customer_id, order_sum,
(first_record).referral, (first_record).other_column
FROM (
SELECT customer_id,
SUM(order_total) AS order_sum,
(
SELECT oi
FROM order oi
WHERE oi.customer_id = o.customer_id
LIMIT 1
) AS first_record
FROM order o
GROUP BY
customer_id
) q
来源:https://stackoverflow.com/questions/821811/group-related-records-but-pick-certain-fields-from-only-the-first-record