GROUP BY clause sees all VARCHAR fields as different

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-24 14:08:36

问题


I have witnessed a strange behaviour while trying to GROUP BY a VARCHAR field.

Let the following example, where I try to spot customers that have changed name at least once in the past.

CREATE TABLE #CustomersHistory
(
Id INT IDENTITY(1,1),
CustomerId INT,
Name VARCHAR(200)
)

INSERT INTO #CustomersHistory VALUES (12, 'AAA')
INSERT INTO #CustomersHistory VALUES (12, 'AAA')
INSERT INTO #CustomersHistory VALUES (12, 'BBB')
INSERT INTO #CustomersHistory VALUES (44, '444')

SELECT ch.CustomerId, count(ch.Name) AS cnt
  FROM #CustomersHistory ch
  GROUP BY ch.CustomerId  HAVING  count(ch.Name) != 1

Which oddly yields (as if 'AAA' from first INSERT was different from the second one)

CustomerId  cnt  //  (I was expecting)
12          3    //   2
44          1    //   1
  • Is this behaviour specific to T-SQL?
  • Why does it behave in this rather counter-intuitive way?
  • How is it customary to overcome this limitation?

Note: This question is very similar to GROUP BY problem with varchar, where I didn't find the answer to Why

Side Note: Is it good practice to use HAVING count(ch.Name) != 1 instead of HAVING count(ch.Name) > 1 ?


回答1:


The COUNT() operator will count all rows regardless of value. I think you might want to use a COUNT(DISTINCT ch.Name) which will only count unique names.

SELECT ch.CustomerId, count(DISTINCT ch.Name) AS cnt
  FROM #CustomersHistory ch
  GROUP BY ch.CustomerId  HAVING  count(DISTINCT ch.Name) > 1

For more information, take a look at the COUNT() article on book online



来源:https://stackoverflow.com/questions/14692048/group-by-clause-sees-all-varchar-fields-as-different

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!