MySQL: Can I do a left join and pull only one row from the join table?

问题

I wrote a custom help desk for work and it's been running great... until recently. One query has really slowed down. It takes about 14 seconds now! Here are the relevant tables:

CREATE TABLE `tickets` (
  `id` int(11) unsigned NOT NULL DEFAULT '0',
  `date_submitted` datetime DEFAULT NULL,
  `date_closed` datetime DEFAULT NULL,
  `first_name` varchar(50) DEFAULT NULL,
  `last_name` varchar(50) DEFAULT NULL,
  `email` varchar(50) DEFAULT NULL,
  `description` text,
  `agent_id` smallint(5) unsigned NOT NULL DEFAULT '1',
  `status` smallint(5) unsigned NOT NULL DEFAULT '1',
  `priority` tinyint(4) NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`),
  KEY `date_closed` (`date_closed`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `solutions` (
  `id` int(10) unsigned NOT NULL,
  `ticket_id` mediumint(8) unsigned DEFAULT NULL,
  `date` datetime DEFAULT NULL,
  `hours_spent` float DEFAULT NULL,
  `agent_id` smallint(5) unsigned DEFAULT NULL,
  `body` text,
  PRIMARY KEY (`id`),
  KEY `ticket_id` (`ticket_id`),
  KEY `date` (`date`),
  KEY `hours_spent` (`hours_spent`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

When a user submits a ticket, it goes into the "tickets" table. Then, as the agents work through the problem, they record the actions they took. Each entry goes into the "solutions" table. In other words, tickets have many solutions.

The goal of the query that has slowed down is to pull all the fields from the "tickets" table and also the latest entry from the "solutions" table. This is the query I've been using:

SELECT tickets.*,
    (SELECT CONCAT_WS(" * ", DATE_FORMAT(solutions.date, "%c/%e/%y"), solutions.hours_spent, CONCAT_WS(": ", solutions.agent_id, solutions.body))
    FROM solutions
    WHERE solutions.ticket_id = tickets.id
    ORDER BY solutions.date DESC, solutions.id DESC
    LIMIT 1
) AS latest_solution_entry
FROM tickets
WHERE tickets.date_closed IS NULL
OR tickets.date_closed >= '2012-06-20 00:00:00'
ORDER BY tickets.id DESC

Here is an example of what the "latest_solution_entry" field looks like:

6/20/12 * 1337 * 1: I restarted the computer and that fixed the problem. Yes, I took an hour to do this.

In PHP, I split up the "latest_solution_entry" field and format it correctly.

When I noticed that the page that runs the query had slowed way down, I ran the query without the subquery and it was super fast. I then ran an EXPLAIN on the original query and got this:

+----+--------------------+-----------+-------+---------------+-----------+---------+---------------------+-------+-----------------------------+
| id | select_type        | table     | type  | possible_keys | key       | key_len | ref                 | rows  | Extra                       |
+----+--------------------+-----------+-------+---------------+-----------+---------+---------------------+-------+-----------------------------+
|  1 | PRIMARY            | tickets   | index | date_closed   | PRIMARY   | 4       | NULL                | 35804 | Using where                 |
|  2 | DEPENDENT SUBQUERY | solutions | ref   | ticket_id     | ticket_id | 4       | helpdesk.tickets.id |     1 | Using where; Using filesort |
+----+--------------------+-----------+-------+---------------+-----------+---------+---------------------+-------+-----------------------------+

So I'm looking for a way to make my query more efficient, yet still achieve the same goal. Any ideas?

回答1:

Let me sum up what I understood: you'd like to select each ticket and its last solution.

I like using the following pattern for this kind of question as it avoids the subquery pattern and is therefore rather good where performance is needed. The drawback is that it is a bit tricky to understand:

SELECT
  t.*,
  s1.*
FROM tickets t
INNER JOIN solutions s1 ON t.id = s1.ticket_id
LEFT JOIN solutions s2 ON s1.ticket_id = s2.ticket_id AND s2.id > s1.id
WHERE s2.id IS NULL;

I wrote only the heart of the pattern for a better understanding.

The keys are:

the LEFT JOIN of the solutions table with itself with the s1.ticket_id = s2.ticket_id condition: it emulates the GROUP BY ticket_id.
the condition s2.id > s1.id : it is the SQL for "I only want the last solution", it emulates the MAX(). I assumed that in your model, the last means with the greatest id but you could use here a condition on the date. Note that s2.id < s1.id would give you the first solution.
the WHERE clause s2.id IS NULL: the weirdest one but absolutely necessary... keeps only the records you want.

Have a try and let me know :)

Edit 1: I just realised that the second point assumption was oversimplifying the problem. That makes it even more interesting :p I'm trying to see how this pattern may work with your date, id ordering.

Edit 2: Ok, it works great with a little twist. The condition on the LEFT JOIN becomes:

LEFT JOIN solutions s2 ON s1.ticket_id = s2.ticket_id
  AND (s2.date > s1.date OR (s2.date = s1.date AND s2.id > s1.id))

回答2:

When you have a inline view in the SELECT clause it must perform that select for every single row. I find its better in cases like this to put an inline view in the FROM clause instead which will perform the select once.

SELECT t.*, 
       Concat_ws(" * ", Date_format(s.date, "%c/%e/%y"), s.hours_spent, 
       Concat_ws(":", s.agent_id, s.body)) 
FROM   tickets t 
       INNER JOIN (SELECT solutions.ticket_id,
                          Max(solutions.date) maxdate 
                   FROM   solutions 
                   GROUP  BY solutions.ticket_id) last_solutions 
               ON t.id = last_solutions.ticket_id
       INNER JOIN (SELECT solutions.ticket_id,
                          solutions.date,
                          Max(solutions.id) maxid 
                   FROM   solutions 
                   GROUP  BY solutions.ticket_id,
                            solutions.date) last_solution
              ON last_solutions.ticket_id = last_solution.ticket_id 
                 and last_solutions.maxDate = last_solution.Date
       INNER JOIN solutions s 
               ON last_solution.maxid = s.id
WHERE  t.date_closed IS NULL 
        OR t.date_closed >= '2012-06-20 00:00:00' 
ORDER  BY t.id DESC

Note: You may need to make it a LEFT join depending on your needs

回答3:

Try this:

SELECT *
FROM (
  -- for each ticket get the most recent solution date
  SELECT ticket_id, MAX(solutions.date) as date
  FROM solutions
  GROUP BY ticket_id
) t
JOIN tickets ON t.ticket_id = tickets.id
WHERE tickets.date_closed IS NULL OR tickets.date_closed >= '2012-06-20 00:00:00'
ORDER BY tickets.id DESC

Note, that if there is a ticket with 2 solutions of the same date you will have duplicate records in your result set. You will need another join to remove those duplicates or use an absolute sequence like a serial (incrementing primary key).

回答4:

depending on the purpose, I give a idea :

SELECT DISTINCT s1.ticket_id, t.*,  s1.*
FROM tickets t
LEFT JOIN solutions s1 ON t.id = s1.ticket_id

来源：https://stackoverflow.com/questions/11131237/mysql-can-i-do-a-left-join-and-pull-only-one-row-from-the-join-table

标签

mysql

query-optimization