问题
I need a query which can be used in (or as) a function and retrieves N combinations of M values from a table.
Example: Input: table with values in one column in multiple rows
Case 1
N=2
M=4 (Record1 to Record4)
Table
Record1
Record2
Record3
Record4
Output
Record1
Record2
Record3
Record4
Record1,Record2
Record1,Record3
Record1,Record4
Record2,Record3
Record2,Record4
Record3,Record4
Case 2
N=3
M=4 (Record1 to Record4)
Table
Record1
Record2
Record3
Record4
Output
Record1
Record2
Record3
Record4
Record1,Record2
Record1,Record3
Record1,Record4
Record2,Record3
Record2,Record4
Record3,Record4
Record1,Record2,Record3
Record1,Record2,Record4
Record1,Record3,Record4
Record2,Record3,Record4
I am using this question as base code for execution
回答1:
If only a fixed amount of N values per combination is needed, then it can easily be done in a normal SQL.
Simply by using N-1 self-joins.
For example if N = 3 then 2 self-joins :
SELECT
CONCAT(t1.name, ',', t2.name, ',', t3.name) AS names
FROM yourtable t1
JOIN yourtable t2 ON t2.name > t1.name
JOIN yourtable t3 ON t3.name > t2.name;
Because of the use of >
in the joins, that wouldn't return duplicates of the same combinations in a different order.
(Since A,B,C
= A,C,B
= B,A,C
= B,C,A
= C,A,B
= C,B,A
)
If N is variable, then such method could be used in a Dynamic Sql that adds N-1 joins to the query string.
However, to get the expected output of the question, to return also N=1 & N=2 & N=3 then we could use that trick in combination with a Recursive CTE.
For example this T-SQL snippet:
declare @yourtable table ([name] varchar(30));
insert into @yourtable ([name]) values
('Record1'),
('Record2'),
('Record3'),
('Record4');
WITH RCTE AS
(
SELECT 1 as N, t.name as prev_name, cast(t.name as varchar(max)) AS names
FROM @yourtable t
UNION ALL
SELECT N + 1, t.name, CONCAT(r.names,','+ t.name)
FROM @yourtable t
JOIN RCTE r ON t.name > r.prev_name AND r.N < 3
)
SELECT names
FROM RCTE
ORDER BY N, names;
Returns:
names
------------------------
Record1
Record2
Record3
Record4
Record1,Record2
Record1,Record3
Record1,Record4
Record2,Record3
Record2,Record4
Record3,Record4
Record1,Record2,Record3
Record1,Record2,Record4
Record1,Record3,Record4
Record2,Record3,Record4
回答2:
Here I implemented a traversal algorithm through the combination of names in an alphabetically sorted order up to N names. The algorithm does not traverse through all 2^N-1
combinations as in the base code mentioned.
The algorithm is explained as follows. Denote the i-th
name is represented by a digit i
(starting from 1), then the combination of names can be represented by a sequence of strictly increasing digits. The strictly increasing constraint eliminates different permutations from the same combination.
Now we can traverse through these representations in an alphabetically increasing order as shown below (like in a real dictionary). One can clearly see, there are 2 keys to the traversal: (1) when to do a round-up and (2) doing it subject to the strictly-increasing constraint. The rest is to implement them correctly in the code.
Disclaimer: Low efficiency, maintainability and scalability! It barely runs on small dataset (38 sec for 1350 combinations from M=20
and N=3
). Consider doing such task in a general purpose programming language especially when the dataset is large. e.g. A pythonic solution with only 4 lines
A concept of what the algorithms does
Combination of names
---------------------
1 -- 1 means 'name1'
2
...
n -- Round up to 2-digit combinations after this number
1,2 -- The lowest digit begins at 2 because (1,1) violates the constraint
1,3
...
1,n
2,3 -- begins at 3 because 2,1 and 2,2 violates the constraint
2,4
....
2,n
.....
(n-1),n -- Can't round up to (n,n) so here ends the combinations of 2 digits.
1,2,3 -- In general, round-up gives a consecutive increasing sequence
-- starting from the rounding digit
1,2,4
.... -- repeat until all wanted combinations are exhausted.
And here is the T-SQL code
Test Dataset and Parameters
use [testdb];
if OBJECT_ID('testdb..test') is not null
drop table testdb..test;
create table test (
[name] VARCHAR(10)
);
GO
-- name list, need not be sorted
insert into test([name])
values ('name1'),('name2'),('name3'),('name4'),('name5');
-- Number of names used in the combination
declare @N int = 3;
Program
/***** temp variables *****/
-- get number of names
declare @M int;
select @M = count(distinct [name]) from test;
--print @M;
-- array (table variable) of temporary status
declare @arr table (
digit int primary key,
val int,
[name] varchar(max)
)
-- the value of each digits begins from 0
insert @arr(digit, val, [name])
select ROW_NUMBER() over(order by [name]), 0, [name] from test;
-- begin traversal from the first name
update @arr
set val = 1
where digit = 1;
-- check:
-- select * from @arr;
-- concatenation of names
DECLARE @cat VARCHAR(max) = ''
-- current value at the lowest digit
declare @val_now int = 1; -- 1 to N
-- digit pointer and temporary value used for rounding
declare @digit_round int;
declare @val_round int;
-- answer storage
if OBJECT_ID('tempdb..#ans') is not null
drop table #ans;
create table #ans (
names varchar(max)
)
/***** Main Loop *****/
declare @flag_terminate int = 0;
while (@flag_terminate = 0) begin
while @val_now <= @M BEGIN
/** output current combination **/
-- select * from @arr;
set @cat = '';
-- use cursor to preserve name order
declare @name_now varchar(100);
declare cur cursor local
for select (select B.[name] from @arr as B where B.digit = A.val) as [name]
from @arr as A
where A.val > 0
order by [name];
open cur;
while 1=1 begin
fetch next from cur into @name_now;
if @@FETCH_STATUS <> 0 break;
set @cat = case @cat
when '' then ''
else @cat + ','
end + @name_now;
end
close cur;
deallocate cur;
-- Can also do this if you don't care about name order
-- Note that 'order by' won't work on subqueries
-- SELECT @cat = case @cat
-- when '' then ''
-- else @cat + ','
-- end + (select B.[name] from @arr as B where B.digit = A.val)
-- FROM @arr as A
-- where A.val > 0;
insert into #ans(names) values (@cat);
/** Proceed to next combination **/
-- 1. no round-up case
if @val_now < @M begin
set @val_now += 1;
update @arr
set val = @val_now
where digit = 1;
end
-- 2. round-up case
else begin
-- Find which digit at where the rounding process should end
set @digit_round = 1
while @digit_round <= @M begin
-- N.B. Ascending alphabatical sorting of name combanitaions implies
-- max(digit 2) = M - 1
-- max(digit 3) = M - 2
-- ....
-- max(digit N) = M - N + 1
set @val_round = (select val from @arr where digit = @digit_round);
if @val_round < @M - @digit_round + 1
break;
set @digit_round += 1;
end
-- print ' digit_round = ' + str(@digit_round)
-- print ' val_round = ' + str(@val_round)
-- 2.1 regular round-up case
-- rounding process ends within N digits
if @digit_round <= @N begin
while @digit_round >= 1 begin
-- rounded value
set @val_round += 1;
-- print 'start rounding:'
-- print ' digit_round = ' + str(@digit_round)
-- print ' val_round = ' + str(@val_round)
-- update temp table
update @arr
set val = @val_round
where digit = @digit_round;
-- next digit
set @digit_round -= 1;
end
-- reset loop vars
set @val_now = @val_round;
end
-- 2.2 termination case
-- rounding process does not end in N digits
else begin
-- print 'Termination reached!'
set @flag_terminate = 1;
set @val_now = @M + 1;
end
end
end
end
-- output
select * from #ans;
Result
| | names |
|----|-------------------|
| 1 | name1 |
| 2 | name2 |
| 3 | name3 |
| 4 | name4 |
| 5 | name5 |
| 6 | name1,name2 |
| 7 | name1,name3 |
| 8 | name1,name4 |
| 9 | name1,name5 |
| 10 | name2,name3 |
| 11 | name2,name4 |
| 12 | name2,name5 |
| 13 | name3,name4 |
| 14 | name3,name5 |
| 15 | name4,name5 |
| 16 | name1,name2,name3 |
| 17 | name1,name2,name4 |
| 18 | name1,name2,name5 |
| 19 | name1,name3,name4 |
| 20 | name1,name3,name5 |
| 21 | name1,name4,name5 |
| 22 | name2,name3,name4 |
| 23 | name2,name3,name5 |
| 24 | name2,name4,name5 |
| 25 | name3,name4,name5 |
Tested on SQL Server 2017 latest (linux docker image).
回答3:
What can be maximum value of N
& M
?
I am using RBAR
and dynamic Sql
and CROSS JOIN
.
DECLARE @N INT = 3
DECLARE @M INT = 4 --M=4 (Record1 to Record4)
-- Purpose of #temp
-- Insert your resultset in #temp using RowNumber or Top (where RowNumber=@M or Top(@M))
-- This is easy part
CREATE TABLE #temp (
id INT identity(1, 1)
,col VARCHAR(50)
)
INSERT INTO #temp
VALUES ('Record1')
,('Record2')
,('Record3')
,('Record4')
DECLARE @i INT = 1
DECLARE @Sql NVARCHAR(500) = ''
DECLARE @Col NVARCHAR(500) = ''
DECLARE @From NVARCHAR(500)
DECLARE @Where NVARCHAR(500) = ''
DECLARE @Alias NVARCHAR(5)
WHILE (@i <= @N)
BEGIN
SET @Alias = 't' + cast(@i AS VARCHAR)
IF (@i = 1)
BEGIN
SET @Col = CONCAT (
@Alias
,'.' + 'col'
)
SET @From = '#temp ' + @Alias
SET @Sql = 'select ' + @Col + ' from ' + @From + ''
END
ELSE IF (@i = 2)
BEGIN
SET @Col = @Col + '+'',''' + '+' + CONCAT (
@Alias
,'.' + 'col'
,' '
)
SET @From = @From + ',' + '#temp ' + @Alias
SET @Where = @Where + 't' + cast(@i - 1 AS VARCHAR) + '.id < t' + cast(@i AS VARCHAR) + '.id'
SET @Sql = @Sql + ' union all select ' + @Col + '' + 'from ' + @From + ' where ' + @Where
END
ELSE IF (@i > 2)
BEGIN
SET @Col = @Col + '+'',''' + '+' + CONCAT (
@Alias
,'.' + 'col'
,' '
)
SET @From = @From + ',' + '#temp ' + @Alias
SET @Where = @Where + ' and ' + 't' + cast(@i - 1 AS VARCHAR) + '.id < t' + cast(@i AS VARCHAR) + '.id'
SET @Sql = @Sql + ' union all select ' + @Col + '' + 'from ' + @From + ' where ' + @Where
END
SET @i = @i + 1
END
PRINT @Col
PRINT @Sql
EXECUTE sp_executesql @Sql
DROP TABLE #temp
It is working correctly for N=1,2,3,4.
来源:https://stackoverflow.com/questions/58232207/sql-query-for-combinations-without-repetition-for-top-n-cases