Group by sql query on comma joined column

后端未结

关注

 6  1320

孤独总比滥情好

My table structure is like below, "Mail" column can contain multiple email joined by comma

Data(int)

Mail(v

相关标签:

6条回答

孤独总比滥情好

2020-12-10 23:13

SQL Server Using a recursive cte.

declare @Mail table (ID int, Mail varchar(200)) insert into @Mail values (1, 'm1@gmail.com,m2@hotmail.com'), (2, 'm2@hotmail.com,m3@test.com'), (3, 'm2@hotmail.com') ;with cte1 as ( select Mail+',' as Mail from @Mail ), cte2 as ( select left(Mail, charindex(',', Mail)-1) as Mail1, right(Mail, len(Mail)-charindex(',', Mail)) as Mail from cte1 union all select left(Mail, charindex(',', Mail)-1) as Mail1, right(Mail, len(Mail)-charindex(',', Mail)) as Mail from cte2 where charindex(',', Mail) > 1 ) select Mail1 as Mail, count(*) as [Count] from cte2 group by Mail1

Edit 1 Same as before but handles the case where there is only one email in Mail

0 讨论(0)

发布评论:

提交评论

加载中...

无人及你

2020-12-10 23:13

The correct thing to do would be to add a related table to store multiple emails. It is virtually always a bad design decision to store things in a comma delimited list as you have found in trying to query it. This generally means you need to create a related table as you have a one-to-many relationship. The task you want to do is trivial if you have properly related tables.

I don't buy I can't change the table structure as an excuse. Unless this is a commercial product that your company doesn't own, you can change the structure, you just need to show management why it is necessary. Someone at your organization can change the database structure, find out who and convince him as to why it needs to change. If it is a commercial database, consider creating a trigger on the tble to populate a realted table that you create every time the email field is inserted updated or deleted. Then at least you only have to go through the splitting process once for each record change rather than every time the query is run.

0 讨论(0)

发布评论:

提交评论

加载中...

梦谈多话

2020-12-10 23:15

This is a supplementary answer to show the performance of the various options:

Fill a sample table with some data

create table tmp1 ([Data] int, [Mail] varchar(200)) insert tmp1 SELECT 1,'m1@gmail.com,m2@hotmail.com,other, longer@test, fifth' UNION ALL SELECT 2,'m2@hotmail.com,m3@test.com' UNION ALL SELECT 3,'m3@single.com' UNION ALL SELECT 4,'' UNION ALL SELECT 5,null insert tmp1 select data*10000 + number, mail from tmp1, master..spt_values v where v.type='P' -- total rows: 10245

Test query:

set statistics io on set statistics time on dbcc dropcleanbuffers dbcc freeproccache select single, count(*) [Count] from ( select ltrim(rtrim(substring(t.mail, v.number+1, isnull(nullif(charindex(',',t.mail,v.number+1),0)-v.number-1,200)))) single from tmp1 t inner join master..spt_values v on v.type='p' and v.number <= len(t.Mail) and (substring(t.mail,v.number,1) = ',' or v.number=0) ) X group by single dbcc dropcleanbuffers dbcc freeproccache ;with cte1 as ( select Mail+',' as Mail from tmp1 ), cte2 as ( select left(Mail, charindex(',', Mail)-1) as Mail1, right(Mail, len(Mail)-charindex(',', Mail)) as Mail from cte1 union all select left(Mail, charindex(',', Mail)-1) as Mail1, right(Mail, len(Mail)-charindex(',', Mail)) as Mail from cte2 where charindex(',', Mail) > 1 ) select Mail1 as Mail, count(*) as [Count] from cte2 group by Mail1 dbcc dropcleanbuffers dbcc freeproccache --SET ANSI_DEFAULTS ON --SET ANSI_NULLS ON ; SELECT address AS Mail, COUNT(*) AS [Count] FROM tmp1 CROSS APPLY (SELECT CAST('<m>' + REPLACE([Mail], ',', '</m><m>') + '</m>' AS XML ) AS x) ca1 CROSS APPLY (SELECT T.split.value('.', 'varchar(200)') AS address FROM x.nodes('/m') T(split)) ca GROUP BY address

Statistics

Run a few time to get a feel for the averages

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'spt_values'. Scan count 8196, logical reads 26637, physical reads 2, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'tmp1'. Scan count 3, logical reads 43, physical reads 0, read-ahead reads 14, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 641 ms, elapsed time = 412 ms. Table 'Worktable'. Scan count 2, logical reads 103271, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'tmp1'. Scan count 1, logical reads 43, physical reads 0, read-ahead reads 14, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 609 ms, elapsed time = 614 ms. Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'tmp1'. Scan count 3, logical reads 43, physical reads 0, read-ahead reads 14, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 2798 ms, elapsed time = 1421 ms. Table 'Worktable'. Scan count 2, logical reads 103334, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'tmp1'. Scan count 1, logical reads 43, physical reads 0, read-ahead reads 14, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 734 ms, elapsed time = 742 ms.

Summary

First (CHARINDEX) : CPU time = 344 ms, elapsed time = 198 ms.
Second (CTE) : CPU time = 594 ms, elapsed time = 613 ms.
Third (XML) : CPU time = 2812 ms, elapsed time = 1418 ms.
Fourth (CTE2) : CPU time = 719 ms, elapsed time = 750 ms.

0 讨论(0)

发布评论:

提交评论

加载中...

夕颜

2020-12-10 23:19

A SQL Server Solution

WITH T ([Data], [Mail]) AS (SELECT 1,'m1@gmail.com,m2@hotmail.com' UNION ALL SELECT 2,'m2@hotmail.com,m3@test.com') SELECT address AS Mail, COUNT(*) AS [Count] FROM T CROSS APPLY (SELECT CAST('<m>' + REPLACE([Mail], ',', '</m><m>') + '</m>' AS XML ) AS x) ca1 CROSS APPLY (SELECT T.split.value('.', 'varchar(200)') AS address FROM x.nodes('/m') T(split)) ca GROUP BY address

0 讨论(0)

发布评论:

提交评论

加载中...

抹茶落季

2020-12-10 23:23

String splitting is faster using only CHARINDEX without XML or CTE.

Sample table

create table #tmp ([Data] int, [Mail] varchar(200)) insert #tmp SELECT 1,'m1@gmail.com,m2@hotmail.com,other, longer@test, fifth' UNION ALL SELECT 2,'m2@hotmail.com,m3@test.com' UNION ALL SELECT 3,'m3@single.com' UNION ALL SELECT 4,'' UNION ALL SELECT 5,null

The query

select single, count(*) [Count] from ( select ltrim(rtrim(substring(t.mail, v.number+1, isnull(nullif(charindex(',',t.mail,v.number+1),0)-v.number-1,200)))) single from #tmp t inner join master..spt_values v on v.type='p' and v.number <= len(t.Mail) and (substring(t.mail,v.number,1) = ',' or v.number=0) ) X group by single

The only parts you supply are

#tmp: your table name

#mail: the column name

0 讨论(0)

发布评论:

提交评论

加载中...

孤独总比滥情好

2020-12-10 23:28

Very similar to Mikael's answer with minor tweaks...
- Have a field with a 'cached' LEN to avoid having to repeatedly count the length
- Use only one UNION each recursion by replacing a 0 CHARINDEX with NULL

These differences will only really show noticably for long lists, and so with several levels of recursion.

The CROSS APPLY business is just to make the SELECT more tidy, rather than repeat the NULLIF(CHARINDEX) loads of times.

WITH source ( Data, Mail ) AS ( SELECT 1,'m1@gmail.com,m2@hotmail.com' UNION ALL SELECT 2,'m2@hotmail.com,m3@test.com' ) , split_cte AS ( SELECT LEFT (mail, ISNULL(comma - 1, LEN(mail))) AS "current_mail", RIGHT(mail, ISNULL(LEN(mail) - comma, 0)) AS "mail_data", ISNULL(LEN(mail) - comma, 0) AS "chars" FROM source CROSS APPLY (SELECT NULLIF(CHARINDEX(',', mail), 0) AS "comma") AS search UNION ALL SELECT LEFT (mail_data, ISNULL(comma - 1, chars)) AS "current_mail", RIGHT(mail_data, ISNULL(chars - comma, 0)) AS "mail_data", ISNULL(chars - comma, 0) AS "chars" FROM split_cte CROSS APPLY (SELECT NULLIF(CHARINDEX(',', mail_data), 0) AS "comma") AS search WHERE chars > 0 ) SELECT current_mail AS "Mail", COUNT(*) AS "Count" FROM split_cte GROUP BY current_mail

0 讨论(0)

发布评论:

提交评论

加载中...

验证码

看不清?

提交回复