问题
Basically I have a table messages
, with user_id
field that identifies a user that created the message.
When I display a conversation(set of messages) between two users, I want to be able to group the messages by user_id
, but in a tricky way:
Let's say there are some messages (sorted by created_at desc
):
id: 1, user_id: 1
id: 2, user_id: 1
id: 3, user_id: 2
id: 4, user_id: 2
id: 5, user_id: 1
I want to get 3 message groups in the below order:
[1,2], [3,4], [5]
It should group by *user_id* until it sees a different one and then groups by that one.
I'm using PostgreSQL and would be happy to use something specific to it, whatever would give the best performance.
回答1:
Proper SQL
@Igor presents a nice pure-SQL technique with window functions.
However:
I want to get 3 message groups in the below order: [1,2], [3,4], [5]
To get the requested order, add ORDER BY min(id)
:
SELECT array_agg(id) AS ids
FROM (
SELECT id
,user_id
,row_number() OVER (ORDER BY id) -
row_number() OVER (PARTITION BY user_id ORDER BY id) AS grp
FROM messages
ORDER BY id) t -- for ordered arrays in result
GROUP BY grp, user_id
ORDER BY min(id);
SQL Fiddle.
The addition would barely warrant another answer. The more important issue is this:
Faster with PL/pgSQL
I'm using PostgreSQL and would be happy to use something specific to it, whatever would give the best performance.
Pure SQL is all nice and shiny, but a procedural server-side function is much faster for this task. While processing rows procedurally is generally slower, plpgsql wins this competition big-time, because it can make do with a single table scan and a single ORDER BY
operation:
CREATE OR REPLACE FUNCTION f_msg_groups()
RETURNS TABLE (ids int[]) AS
$func$
DECLARE
_id int;
_uid int;
_id0 int; -- id of last row
_uid0 int; -- user_id of last row
BEGIN
FOR _id, _uid IN
SELECT id, user_id FROM messages ORDER BY id
LOOP
IF _uid <> _uid0 THEN
RETURN QUERY VALUES (ids); -- output row (never happens after 1 row)
ids := ARRAY[_id]; -- start new array
ELSE
ids := ids || _id; -- add to array
END IF;
_id0 := _id;
_uid0 := _uid; -- remember last row
END LOOP;
RETURN QUERY VALUES (ids); -- output last iteration
END
$func$ LANGUAGE plpgsql;
Call:
SELECT * FROM f_msg_groups();
Benchmark and links
I ran a quick test with EXPLAIN ANALYZE
on a similar real life table with 60k rows (execute several times, pick fastest result to exclude cashing effects):
SQL:
Total runtime: 1009.549 ms
Pl/pgSQL:
Total runtime: 336.971 ms
Also consider these closely related questions:
- GROUP BY and aggregate sequential numeric values
- GROUP BY consecutive dates delimited by gaps
- Ordered count of consecutive repeats / duplicates
回答2:
Try something like this:
SELECT user_id, array_agg(id)
FROM (
SELECT id,
user_id,
row_number() OVER (ORDER BY created_at)-
row_number() OVER (PARTITION BY user_id ORDER BY created_at) conv_id
FROM table1 ) t
GROUP BY user_id, conv_id;
The expression:
row_number() OVER (ORDER BY created_at)-
row_number() OVER (PARTITION BY user_id ORDER BY created_at) conv_id
Will give you a special id for every message group (this conv_id
can be repeated for other user_id
, but user_id, conv_id
will give you all distinct message groups)
My SQLFiddle with example.
Details: row_number(), OVER (PARTITION BY ... ORDER BY ...)
回答3:
The GROUP BY
clause will collapse the response in 2 records - one with user_id
1 and one with user_id
2 no matter of the ORDER BY
clause so I recommend you'd send just the ORDER BY created_at
prev_id = -1
messages.each do |m|
if ! m.user_id == prev_id do
prev_id = m.user_id
#do whatever you want with a new message group
end
end
回答4:
You can use chunk:
Message = Struct.new :id, :user_id
messages = []
messages << Message.new(1, 1)
messages << Message.new(2, 1)
messages << Message.new(3, 2)
messages << Message.new(4, 2)
messages << Message.new(5, 1)
messages.chunk(&:user_id).each do |user_id, records|
p "#{user_id} - #{records.inspect}"
end
The output:
"1 - [#<struct Message id=1, user_id=1>, #<struct Message id=2, user_id=1>]"
"2 - [#<struct Message id=3, user_id=2>, #<struct Message id=4, user_id=2>]"
"1 - [#<struct Message id=5, user_id=1>]"
来源:https://stackoverflow.com/questions/14010348/group-by-repeating-attribute