ORACLE SQL select distinct not removing duplicates

问题

I have the following tables; format: table_name[column1, column2, etc..]

VENDOR_ORDERS [ORDER_ID, ORDER_CREATION_DATETIME, REGION_ID, ZIP_CODE, AMOUNT]
CALENDAR [CALENDAR_WEEK, CALENDAR_DATE]

basically what i'm trying to achieve is writing a query that will give me:

the COUNT(ORDER_ID) and SUM(AMOUNT) per CALENDAR_WEEK for every REGION_ID and DISTINCT(ZIP_CODE)

so the results should look something like this:

ZIP_CODE    CALENDAR_WEEK    REGION_ID    COUNT(ORDER_ID)    SUM(AMOUNT)
                            --------------------
XXXXX           01              1             50               987.45
YYYYY           01              1             25               568.32
ZZZZZ           01              1             30               555.63
MMMMM           01              1             10               099.93
XXXXX           15              1             05               999.34
YYYYY           15              1             32               339.67
ZZZZZ           15              1             21               457.23
MMMMM           15              1             88               459.99

i used the following code:

SELECT
    DISTINCT(vo.ZIP_CODE)
    ,TO_CHAR(ca.CALENDAR_WEEK)
    ,TRUNC(vo.ORDER_CREATION_DATETIME) -- this column is not needed, i just added it for visualization purposes
    ,vo.REGION_ID
    ,COUNT(vo.ORDER_ID)
    ,SUM(vo.AMOUNT)
FROM
    VENDOR_ORDERS vo
    ,CALENDAR ca
WHERE   
    TRUNC(vo.ORDER_CREATION_DATETIME) = sd.CALENDAR_DATE
    AND vo.REGION_ID = 1
GROUP BY
    vo.ZIP_CODE
    ,TO_CHAR(ca.CALENDAR_WEEK)
    ,vo.ORDER_CREATION_DATETIME
    ,vc.REGION_ID;

the problem is that i'm not getting DISTINCT(ZIP_CODE) per CALENDAR_WEEK, i'm having repeated ZIP_CODE for the same CALENDAR_WEEK, same REGION_ID but different COUNT(ORDER_ID) and SUM(AMOUNT)

i hope i made myself clear. thanks in advance for the help

回答1:

You misunderstand what distinct is. It is not a function. It is a modifier on select and it affects all columns being selected. So, it is behaving exactly as it should.

If you want aggregations by zip code and week, then those are the only two columns that should be in the group by:

SELECT vo.ZIP_CODE, TO_CHAR(ca.CALENDAR_WEEK),
       -- vo.REGION_ID
        COUNT(vo.ORDER_ID),
        SUM(vo.AMOUNT)
FROM VENDOR_ORDERS vo JOIN
     CALENDAR ca
     ON TRUNC(vo.ORDER_CREATION_DATETIME) = sd.CALENDAR_DATE
WHERE vo.REGION_ID = 1
GROUP BY vo.ZIP_CODE, TO_CHAR(ca.CALENDAR_WEEK)

You could probably include region_id as well, assuming that each zip code is in one region.

回答2:

Your DISTINCT has no purpose in this query it will be applied to all columns and not to ORDER_ID only as you think. Think about this: if you have several ORDER_ID with different values for all other columns, how Oracle would know which one to return ??

Additionnaly it is useless to specify the DISTINCT because you are doing a GROUP BY which finally achieve the same results.

And last but not least, you're wrong when you say this in your comments:

-- this column is not needed, i just added it for visualization

You need it in your SELECT because it is an essential field of your GROUP BY

Without seing data sample I can't say it 100%, but your issue is probably due to the fact that in your select you make a TRUNC on your datetime field, and not in your GROUP BY clause. So it doesn't return what you want and you don't understand why because your select show you a truncated date, you think that the GROUP BY worked also on date, but its not the case, it grouped on DATE and TIME

To understand your issue, do:

SELECT
    DISTINCT(vo.ZIP_CODE)
    ,TO_CHAR(ca.CALENDAR_WEEK)
    ,vo.ORDER_CREATION_DATETIME 
    ,vo.REGION_ID
    ,COUNT(vo.ORDER_ID)
    ,SUM(vo.AMOUNT)
FROM
    VENDOR_ORDERS vo
    ,CALENDAR ca
WHERE   
    TRUNC(vo.ORDER_CREATION_DATETIME) = sd.CALENDAR_DATE
    AND vo.REGION_ID = 1
GROUP BY
    vo.ZIP_CODE
    ,TO_CHAR(ca.CALENDAR_WEEK)
    ,vo.ORDER_CREATION_DATETIME
    ,vc.REGION_ID;

To fix your issue, do:

SELECT
    DISTINCT(vo.ZIP_CODE)
    ,TO_CHAR(ca.CALENDAR_WEEK)
    ,TRUNC(vo.ORDER_CREATION_DATETIME) 
    ,vo.REGION_ID
    ,COUNT(vo.ORDER_ID)
    ,SUM(vo.AMOUNT)
FROM
    VENDOR_ORDERS vo
    ,CALENDAR ca
WHERE   
    TRUNC(vo.ORDER_CREATION_DATETIME) = sd.CALENDAR_DATE
    AND vo.REGION_ID = 1
GROUP BY
    vo.ZIP_CODE
    ,TO_CHAR(ca.CALENDAR_WEEK)
    ,TRUNC(vo.ORDER_CREATION_DATETIME)
    ,vc.REGION_ID;

来源：https://stackoverflow.com/questions/35868745/oracle-sql-select-distinct-not-removing-duplicates

标签

sql

Oracle

select

distinct