SQL query like GROUP BY with OR condition

问题

I'll try to describe the real situation. In our company we have a reservation system with a table, let's call it Customers, where e-mail and phone contacts are saved with each incoming order - that's the part of a system I can't change. I'm facing the problem how to get count of unique customers. With the unique customer I mean group of people who has either the same e-mail or same phone number.

Example 1: From the real life you can imagine Tom and Sandra who are married. Tom, who ordered 4 products, filled in our reservation system 3 different e-mail addresses and 2 different phone numbers when one of them shares with Sandra (as a homephone) so I can presume they are connected somehow. Sandra except this shared phone number filled also her private one and for both orders she used only one e-mail address. For me this means to count all of the following rows as one unique customer. So in fact this unique customer may grow up into the whole family.

ID   E-mail              Phone          Comment
---- ------------------- -------------- ------------------------------
0    tom@email.com       +44 111 111    First row
1    tommy@email.com     +44 111 111    Same phone, different e-mail
2    thomas@email.com    +44 111 111    Same phone, different e-mail
3    thomas@email.com    +44 222 222    Same e-mail, different phone
4    sandra@email.com    +44 222 222    Same phone, different e-mail
5    sandra@email.com    +44 333 333    Same e-mail, different phone

As ypercube said I will probably need a recursion to count all of these unique customers.

Example 2: Here is the example of what I want to do.

Is it possible to get count of unique customers without using recursion for instance by using cursor or something or is the recursion necessary ?

ID   E-mail              Phone          Comment
---- ------------------- -------------- ------------------------------
0    linsey@email.com    +44 111 111    ─┐
1    louise@email.com    +44 111 111     ├─ 1. unique customer
2    louise@email.com    +44 222 222    ─┘
---- ------------------- -------------- ------------------------------
3    steven@email.com    +44 333 333    ─┐
4    steven@email.com    +44 444 444     ├─ 2. unique customer
5    sandra@email.com    +44 444 444    ─┘
---- ------------------- -------------- ------------------------------
6    george@email.com    +44 555 555    ─── 3. unique customer
---- ------------------- -------------- ------------------------------
7    xavier@email.com    +44 666 666    ─┐
8    xavier@email.com    +44 777 777     ├─ 4. unique customer
9    xavier@email.com    +44 888 888    ─┘
---- ------------------- -------------- ------------------------------
10   robert@email.com    +44 999 999    ─┐
11   miriam@email.com    +44 999 999     ├─ 5. unique customer
12   sherry@email.com    +44 999 999    ─┘
---- ------------------- -------------- ------------------------------
----------------------------------------------------------------------
Result                                  ∑ = 5 unique customers
----------------------------------------------------------------------

I've tried a query with GROUP BY but I don't know how to group the result by either first or second column. I'm looking for let's say something like

SELECT COUNT(*) FROM Customers
GROUP BY Email OR Phone

Thanks again for any suggestions

P.S. I really appreciate the answers for this question before the complete rephrase. Now the answers here may not correspond to the update so please don't downvote here if you're going to do it (except the question of course :). I completely rewrote this post.

Thanks and sorry for my wrong start.

回答1:

Here is a full solution using a recursive CTE.

;WITH Nodes AS
(
    SELECT DENSE_RANK() OVER (ORDER BY Part, PartRank) SetId
        , [ID]
    FROM
    (
        SELECT [ID], 1 Part, DENSE_RANK() OVER (ORDER BY [E-mail]) PartRank
        FROM dbo.Customer
        UNION ALL
        SELECT [ID], 2, DENSE_RANK() OVER (ORDER BY Phone) PartRank
        FROM dbo.Customer
    ) A
),
Links AS
(
    SELECT DISTINCT A.Id, B.Id LinkedId
    FROM Nodes A
    JOIN Nodes B ON B.SetId = A.SetId AND B.Id < A.Id
),
Routes AS
(
    SELECT DISTINCT Id, Id LinkedId
    FROM dbo.Customer

    UNION ALL

    SELECT DISTINCT Id, LinkedId
    FROM Links

    UNION ALL

    SELECT A.Id, B.LinkedId
    FROM Links A
    JOIN Routes B ON B.Id = A.LinkedId AND B.LinkedId < A.Id
),
TransitiveClosure AS
(
    SELECT Id, Id LinkedId
    FROM Links

    UNION

    SELECT LinkedId Id, LinkedId
    FROM Links

    UNION

    SELECT Id, LinkedId
    FROM Routes
),
UniqueCustomers AS
(
    SELECT Id, MIN(LinkedId) UniqueCustomerId
    FROM TransitiveClosure
    GROUP BY Id
)
SELECT A.Id, A.[E-mail], A.Phone, B.UniqueCustomerId
FROM dbo.Customer A
JOIN UniqueCustomers B ON B.Id = A.Id

回答2:

Finding groups that have only same Phone:

SELECT
    ID
  , Name
  , Phone
  , DENSE_RANK() OVER (ORDER BY Phone) AS GroupPhone
FROM 
    MyTable
ORDER BY
    GroupPhone
  , ID

Finding groups that have only same Name:

SELECT
    ID
  , Name
  , Phone
  , DENSE_RANK() OVER (ORDER BY Name) AS GroupName
FROM 
    MyTable
ORDER BY
    GroupName
  , ID

Now, for the (complex) query you describe, let's say we have a table like this instead:

ID   Name          Phone
---- ------------- -------------
0    Kate          +44 333 333
1    Sandra        +44 000 000
2    Thomas        +44 222 222
3    Robert        +44 000 000
4    Thomas        +44 444 444
5    George        +44 222 222
6    Kate          +44 000 000
7    Robert        +44 444 444
--------------------------------

Should all these be in one group? As they all share name or phone with someone else, forming a "chain" of relative persons:

0-6   same name
6-1-3 same phone
3-7   same name
7-4   same-phone
4-2   same name
2-5   bame phone

回答3:

For the dataset in the example you could write something like this:

;WITH Temp AS (
    SELECT Name, Phone,
        DENSE_RANK() OVER (ORDER BY Name) AS NameGroup,
        DENSE_RANK() OVER (ORDER BY Phone) AS PhoneGroup
    FROM MyTable)
SELECT MAX(Phone), MAX(Name), COUNT(*)
FROM Temp
GROUP BY NameGroup, PhoneGroup

回答4:

I don't know if this is the best solution, but here it is:

SELECT
  MyTable.ID, MyTable.Name, MyTable.Phone,
  CASE WHEN N.No = 1 AND P.No = 1 THEN 1
       WHEN N.No = 1 AND P.No > 1 THEN 2
       WHEN N.No > 1 OR P.No > 1  THEN 3
  END as GroupRes
FROM
  MyTable 
  JOIN (SELECT Name, count(Name) No FROM MyTable GROUP BY Name) N on MyTable.Name = N.Name
  JOIN (SELECT Phone, count(Phone) No FROM MyTable GROUP BY Phone) P on MyTable.Phone = P.Phone

The problem is that here are some joins made on varchars and could end up in increasing execution time.

回答5:

Here is my solution:

SELECT p.LastName, P.FirstName, P.HomePhone,
CASE 
    WHEN ph.PhoneCount=1 THEN       
        CASE 
            WHEN n.NameCount=1 THEN 'unique name and phone'
            ELSE 'common name'
        END

    ELSE        
        CASE 
            WHEN n.NameCount=1 THEN 'common phone'
            ELSE 'common phone and name'        
        END             
END
FROM Contacts p
INNER JOIN 
(SELECT HomePhone, count(LastName) as PhoneCount
FROM Contacts
GROUP BY HomePhone) ph ON ph.HomePhone = p.HomePhone

INNER JOIN 
(SELECT FirstName, count(LastName) as NameCount
FROM Contacts
GROUP BY FirstName) n ON n.FirstName = p.FirstName


LastN       FirstN  Phone       Comment
Hoover      Brenda  8138282334  unique name and phone
Washington  Brian   9044563211  common name
Roosevelt   Brian   7737653279  common name
Reagan      Charles 7734567869  unique name and phone

来源：https://stackoverflow.com/questions/6280629/sql-query-like-group-by-with-or-condition

标签

sql-server

sql-server-2005

tsql

recursion