问题
Possible Duplicate:
How can I modify this query with two Inner Joins so that it stops giving duplicate results?
I'm having trouble getting my query to work.
SELECT itpitems.identifier, itpitems.name, itpitems.subtitle, itpitems.description, itpitems.itemimg, itpitems.mainprice, itpitems.upc, itpitems.isbn, itpitems.weight, itpitems.pages, itpitems.publisher, itpitems.medium_abbr, itpitems.medium_desc, itpitems.series_abbr, itpitems.series_desc, itpitems.voicing_desc, itpitems.pianolevel_desc, itpitems.bandgrade_desc, itpitems.category_code, itprank.overall_ranking, itpitnam.name AS artist, itpitnam.type_code FROM itpitems
INNER JOIN itprank ON (itprank.item_number = itpitems.identifier)
INNER JOIN (SELECT DISTINCT type_code FROM itpitnam) itpitnam ON (itprank.item_number = itpitnam.item_number)
WHERE mainprice > 1
LIMIT 3
I keep getting Unknown column 'itpitnam.name' in 'field list'.
However, if I change DISTINCT type_code to *, I do not get that error, but I do not get the results I want either.
This is a big result table so I am making a dummy example...
With *, I get something like:
+-----------+---------+----------+
| identifier| name | type_code|
+-----------+---------+----------+
| 2 | Joe | A |
| 2 | Amy | R |
| 7 | Mike | B |
+-----------+------------+-------+
The problem here is that I have two instances of identifier = 2 because the type_code is different. I have tried GROUP BY at the outside end of the query, but it is sifting through so many records it creates too much strain on the server, so I'm trying to find an alternative way of getting the results I need.
What I want to achieve (using the same dummy output) would look something like this:
+-----------+---------+----------+
| identifier| name | type_code|
+-----------+---------+----------+
| 2 | Joe | A |
| 7 | Mike | B |
| 8 | Sam | R |
+-----------+------------+-------+
It should skip over the duplicate identifier regardless if type_code is different.
Can someone help me modify this query to get the results as simulated in the above chart?
回答1:
One approach is to use an inline view, like the query you already have. But instead of using DISTINCT, you would use a GROUP BY to eliminate duplicates. The simplest inline view to satisfy your requirements would be:
( SELECT n.item_number, n.name, n.type_code
FROM itpitnam n
GROUP BY n.item_number
) itpitnam
Although its not deterministic as to which row from itpitnam the values for name and type_code are retrieved from. A more elaborate inline view can make this more specific.
Another common approach to this type of problem is to use a correlated subquery in the SELECT list. For returning a small set of rows, this can perform reasonably well. But for returning large sets, there are more efficient approaches.
SELECT i.identifier
, i.name
, i.subtitle
, i.description
, i.itemimg
, i.mainprice
, i.upc
, i.isbn
, i.weight
, i.pages
, i.publisher
, i.medium_abbr
, i.medium_desc
, i.series_abbr
, i.series_desc
, i.voicing_desc
, i.pianolevel_desc
, i.bandgrade_desc
, i.category_code
, r.overall_ranking
, ( SELECT n1.name
FROM itpitnam n1
WHERE n1.item_number = r.item_number
ORDER BY n1.type_code, n1.name
LIMIT 1
) AS artist
, ( SELECT n2.type_code
FROM itpitnam n2
WHERE n2.item_number = r.item_number
ORDER BY n2.type_code, n2.name
LIMIT 1
) AS type_code
FROM itpitems i
JOIN itprank r
ON r.item_number = i.identifier
WHERE mainprice > 1
LIMIT 3
That query will return the specified resultset, with one significant difference. The original query shows an INNER JOIN to the itpitnam
table. That means that a row will be returned ONLY of there is a matching row in the itpitnam
table. The query above, however, emulates an OUTER JOIN, the query will return a row when there is no matching row found in itpitnam
.
UPDATE
For best performance of those correlated subqueries, you'll want an appropriate index available,
... ON itpitnam (item_number, type_code, name)
That index is most appropriate because it's a "covering index", the query can be satisfied entirely from the index without referencing data pages in the underlying table, and there's equality predicate on the leading column, and an ORDER BY on the next two columns, so that will a avoid a "sort" operation.
--
If you have a guarantee that either the type_code
or name
column in the itpitnam table is NOT NULL, you can add a predicate to eliminate the rows that are "missing" a matching row, e.g.
HAVING artist IS NOT NULL
(Adding that will likely have an impact on performance.) Absent that kind of guarantee, you'd need to add an INNER JOIN or a predicate that tests for the existence of a matching row, to get an INNER JOIN behavior.
回答2:
SELECT a.*
b.overall_ranking,
c.name AS artist,
c.type_code
FROM itpitems a
INNER JOIN itprank b
ON b.item_number = a.identifier
INNER JOIN itpitnam c
ON b.item_number = c.item_number
INNER JOIN
(
SELECT item_number, MAX(type_code) code
FROM itpitnam
GROUP BY item_number
) d ON c.item_number = d.item_number AND
c.type_code = d.code
WHERE mainprice > 1
LIMIT 3
Follow-up question: can you please post the table schema and how are the tables related with each other? So I will know what are the columns to be linked.
来源:https://stackoverflow.com/questions/14658674/using-distinct-inside-join-is-creating-trouble