For simplicity lets strip down the messages table to its minimum, with some sample data
message_id reply_to createdate
1 0 123
2 0 124
3 0 123
4 1 154
5 1 165
the reply_to is the message_id wich the message is a reply to
so im looking for a sql-statement/procedure/function/other table design that lets me select the last 10 messages and for each of those the last 3 replies, i dont mind changing the table structure or even keeping some sort of a record for the last 3 replies
just selecting the last 10 messages is
SELECT * FROM message ORDER BY createdate LIMIT 10;
and for each of those messages the replies are
SELECT * FROM message WHERE reply_to = :message_id: ORDER BY createdate LIMIT 3;
my attempts so far are:
- a triple outer join over the message table as replies
- a plain join but mysql doesnt allow limits in joins
- useing HAVING COUNT(DISTINCT reply_to) <= 3, but ofcourse HAVING is evaluated last
i couldnt get either of those working
my last option atm is to have a separate table to track the last 3 replies per message
message_reply:
message_id, r_1, r_2, r_3
and then updateing that table useing triggers so a new row in the message table wich is a reply updates the message_reply table
UPDATE message_reply SET r_3 = r_2, r_2 = r_1, r_1 = NEW.reply_to WHERE message_id = NEW.message_id
then i could just query the message table for those records
anyone have a better suggestion or even a working SQL statement?
thanks
EDIT:
added EXPLAIN results
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived4> ALL NULL NULL NULL NULL 3
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 10 Using where; Using join buffer
1 PRIMARY r eq_ref PRIMARY,message_id,message_id_2 PRIMARY 4 func 1
4 DERIVED NULL NULL NULL NULL NULL NULL NULL No tables used
5 UNION NULL NULL NULL NULL NULL NULL NULL No tables used
6 UNION NULL NULL NULL NULL NULL NULL NULL No tables used
NULL UNION RESULT <union4,5,6> ALL NULL NULL NULL NULL NULL
2 DERIVED m ALL NULL NULL NULL NULL 299727
3 DEPENDENT SUBQUERY r ref reply_to,reply_to_2 reply_to_2 4 testv4.m.message_id 29973
EDIT 2:
Well i tried the message_reply table method also this is what i did
build the table:
message_reply: message_id, r_1, r_2, r_3
build the trigger:
DELIMITER |
CREATE TRIGGER i_message AFTER INSERT ON message
FOR EACH ROW BEGIN
IF NEW.reply_to THEN
INSERT INTO message_replies (message_id, r_1) VALUES (NEW.reply_to, NEW.message_id)
ON DUPLICATE KEY UPDATE r_3 = r_2, r_2 = r_1, r_1 = NEW.message_id;
ELSE
INSERT INTO message_replies (message_id) VALUES (NEW.message_id);
END IF;
END;
|
DELIMITER ;
and select the messages:
SELECT m.*,r1.*,r2.*,r3.* FROM message_replies mr
LEFT JOIN message m ON m.message_id = mr.message_id
LEFT JOIN message r1 ON r1.message_id = mr.r_1
LEFT JOIN message r2 ON r2.message_id = mr.r_2
LEFT JOIN message r3 ON r3.message_id = mr.r_3
Ofcourse with the trigger preprocessing it for me this is the fastest way.
tested with a few more sets of 100k inserts to see the performance hit for the trigger it took a .4 sec longer to process the 100k rows as it did without the tirgger total time to insert was about 12 sec (on myIsam tables)
A working example:
EDIT - (see revision for earlier query)
Full table creation and explain plan
Note: The table "datetable" just contains all dates for about 10 years. It is used just to generate rows.
drop table if exists messages;
create table messages (
message_id int primary key, reply_to int, createdate datetime, index(reply_to));
insert into messages
select @n:=@n+1, floor((100000 - @n) / 10), a.thedate
from (select @n:=0) n
cross join datetable a
cross join datetable b
limit 1000000;
The above generates 1m messages, and some valid replies. The query:
select m1.message_id, m1.reply_to, m1.createdate, N.N, r.*
from
(
select m.*, (
select group_concat(r.message_id order by createdate)
from messages r
where r.reply_to = m.message_id) replies
from messages m
order by m.message_id
limit 10
) m1
inner join ( # this union-all query controls how many replies per message
select 1 N union all
select 2 union all
select 3) N
on (m1.replies is null and N=1) or (N <= length(m1.replies)-length(replace(m1.replies,',','')))
left join messages r
on r.message_id = substring_index(substring_index(m1.replies, ',', N), ',', -1)
Time: 0.078 sec
Explain plan
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived4> ALL (NULL) (NULL) (NULL) (NULL) 3
1 PRIMARY <derived2> ALL (NULL) (NULL) (NULL) (NULL) 10 Using where
1 PRIMARY r eq_ref PRIMARY PRIMARY 4 func 1
4 DERIVED (NULL) (NULL) (NULL) (NULL) (NULL) (NULL) (NULL) No tables used
5 UNION (NULL) (NULL) (NULL) (NULL) (NULL) (NULL) (NULL) No tables used
6 UNION (NULL) (NULL) (NULL) (NULL) (NULL) (NULL) (NULL) No tables used
(NULL) UNION RESULT <union4,5,6> ALL (NULL) (NULL) (NULL) (NULL) (NULL)
2 DERIVED m index (NULL) PRIMARY 4 (NULL) 1000301
3 DEPENDENT SUBQUERY r ref reply_to reply_to 5 test.m.message_id 5 Using where
I would suggest you build your extra table, and make it work with as many steps as necessary. Sometimes to visualize the answer you need extra steps. At the end, you can compile the SQL into one nested statement.
Note: This answer provides useful information for comparison for OMG's comments, so even if it needs to be deleted, please leave it up for a while.
OMG: Check the pairing of mysql and "greatest-n-per-group" tags -- the request is very common. OMG: Then visit the questions and courteously inform if not answer.
I followed your instructions OMG, and this is what I came up with from
https://stackoverflow.com/questions/tagged/greatest-n-per-group+mysql
- SQL - Give me 3 hits for each type only
- mySQL Returning the top 5 of each category
- MySQL SELECT n records base on GROUP BY
You may have misunderstood the question because of the 3 that looked most similar form the first page of results (2 of which are my answers), the questions deal with a single dimension (top n per category) for the entire table. The solutions offered invariably row_number ALL records in the table ordered by category.
Compare that to the optimized answer provided for this question for the problem domain top-n-category -> top-m-per-category
and you will realize that this question is a different one.
There is no need to visit the questions and courteously inform if not answer
because
- The answer to those questions are valid
- The answer to this question is valid
来源:https://stackoverflow.com/questions/5095495/mysql-select-the-last-10-messages-and-for-each-message-the-last-3-replies