Using DISTINCT inside JOIN is creating trouble [duplicate]

99封情书 提交于 2019-12-06 17:38:28

One approach is to use an inline view, like the query you already have. But instead of using DISTINCT, you would use a GROUP BY to eliminate duplicates. The simplest inline view to satisfy your requirements would be:

( SELECT n.item_number, n.name, n.type_code
    FROM itpitnam n
   GROUP BY n.item_number
) itpitnam

Although its not deterministic as to which row from itpitnam the values for name and type_code are retrieved from. A more elaborate inline view can make this more specific.


Another common approach to this type of problem is to use a correlated subquery in the SELECT list. For returning a small set of rows, this can perform reasonably well. But for returning large sets, there are more efficient approaches.

SELECT i.identifier
     , i.name
     , i.subtitle
     , i.description
     , i.itemimg 
     , i.mainprice
     , i.upc
     , i.isbn
     , i.weight
     , i.pages
     , i.publisher
     , i.medium_abbr
     , i.medium_desc
     , i.series_abbr
     , i.series_desc
     , i.voicing_desc
     , i.pianolevel_desc
     , i.bandgrade_desc
     , i.category_code
     , r.overall_ranking
     , ( SELECT n1.name
           FROM itpitnam n1
          WHERE n1.item_number = r.item_number
          ORDER BY n1.type_code, n1.name
          LIMIT 1
       ) AS artist
     , ( SELECT n2.type_code
           FROM itpitnam n2
          WHERE n2.item_number = r.item_number
          ORDER BY n2.type_code, n2.name
          LIMIT 1
       ) AS type_code
  FROM itpitems i
  JOIN itprank r
    ON r.item_number = i.identifier
 WHERE mainprice > 1
 LIMIT 3

That query will return the specified resultset, with one significant difference. The original query shows an INNER JOIN to the itpitnam table. That means that a row will be returned ONLY of there is a matching row in the itpitnam table. The query above, however, emulates an OUTER JOIN, the query will return a row when there is no matching row found in itpitnam.


UPDATE

For best performance of those correlated subqueries, you'll want an appropriate index available,

... ON itpitnam (item_number, type_code, name)

That index is most appropriate because it's a "covering index", the query can be satisfied entirely from the index without referencing data pages in the underlying table, and there's equality predicate on the leading column, and an ORDER BY on the next two columns, so that will a avoid a "sort" operation.

--

If you have a guarantee that either the type_code or name column in the itpitnam table is NOT NULL, you can add a predicate to eliminate the rows that are "missing" a matching row, e.g.

HAVING artist IS NOT NULL

(Adding that will likely have an impact on performance.) Absent that kind of guarantee, you'd need to add an INNER JOIN or a predicate that tests for the existence of a matching row, to get an INNER JOIN behavior.


SELECT  a.*
        b.overall_ranking, 
        c.name AS artist, 
        c.type_code 
FROM    itpitems a
        INNER JOIN  itprank b 
            ON b.item_number = a.identifier
        INNER JOIN  itpitnam c
            ON b.item_number = c.item_number
        INNER JOIN
        (
            SELECT  item_number, MAX(type_code) code
            FROM    itpitnam
            GROUP   BY item_number
        ) d ON  c.item_number = d.item_number AND
                c.type_code = d.code

WHERE   mainprice > 1    
LIMIT   3

Follow-up question: can you please post the table schema and how are the tables related with each other? So I will know what are the columns to be linked.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!