Create evenly sized groups based on aggregate

问题

Probably a newbie question, but I'm looking to split our server inventory up into several evenly sized groups based on total database size, and am stumped figuring out how to group them. I think NTILE will work, maybe, but I just can't wrap my head around splitting the groups evenly. My example below is just ordering the servers randomly. I would like the results to be 3 groups of fairly even size (obviously won't be exact).

Using SQL Server 2012. Any help is appreciated. Thanks.

declare @Servers table (ServerName sysname, TotalSizeGB decimal (12,2))
insert into @Servers values
('Server1',123.45),
('Server2',234.56),
('Server3',345.67),
('Server4',456.78),
('Server5',567.89),
('Server6',678.90),
('Server7',789.01),
('Server8',890.12),
('Server9',901.23),
('Server10',1023.35)

select GroupNumber, sum(TotalSizeGB) as TotalSizeGB
from (
     select ServerName, sum(TotalSizeGB) as TotalSizeGB, ntile(3) over (order by newid()) as GroupNumber
     from (
          select ServerName, TotalSizeGB from @Servers
          ) x 
     group by ServerName
     ) y
group by GroupNumber

The expected output here would be three groups of about 2000GB each. I expect it won't be exact, but at least close. If grouping per server, it might look like this:

ServerName  TotalSizeGB GroupNumber
Server10    1023.35 1  
Server1 123.45  1
Server5 567.89  1
Server3 345.67  1
Server4 456.78  2
Server7 789.01  2
Server6 678.90  2
Server2 234.56  3
Server9 901.23  3
Server8 890.12  3

If I was taking a sum per group, it would look like this:

GroupNumber TotalSizeGB
1   2060.36
2   1924.69
3   2025.91

回答1:

SELECT  *
FROM(
    SELECT  y.TotalSizeGB,
            CASE 
                WHEN y.AnotherGrp%2=0 AND y.PseudoGrpNumber=0 THEN 2
                WHEN y.AnotherGrp%2=0 AND y.PseudoGrpNumber=1 THEN 1
                WHEN y.AnotherGrp%2=0 AND y.PseudoGrpNumber=2 THEN 0
                ELSE y.PseudoGrpNumber
            END GrpNumber
    FROM(
        SELECT 
            x.ServerName,
            x.TotalSizeGB,
            (2+ROW_NUMBER() OVER(ORDER BY x.TotalSizeGB DESC))%3 PseudoGrpNumber,
            (2+ROW_NUMBER() OVER(ORDER BY x.TotalSizeGB DESC))/3 AnotherGrp,
            ROW_NUMBER() OVER(ORDER BY x.TotalSizeGB DESC) RowNum
        FROM    @Servers x
    )y
)z
PIVOT( SUM(z.TotalSizeGB) FOR z.GrpNumber IN([0],[1],[2]) ) pvt;

Results:

0       1       2
------- ------- -------
2048.02 1925.80 2037.14

Some explanations:

The idea is to sort data descending on TotalSizeGB column. Then every 3 sequential rows are grouped together (column AnotherGrp) first in DESC order and then in ASC order (column PseudoGroNumber and GrpNumber). If it's executed SELECT * FROM () y derivate table then the results will be:

ServerName TotalSizeGB  PseudoGrpNumber AnotherGrp GrpNumber RowNum
---------- ------------ --------------- ---------- --------- ------
Server10   1023.35      0               1          0         1
Server9    901.23       1               1          1         2
Server8    890.12       2               1          2         3
Server7    789.01       0               2          2         4
Server6    678.90       1               2          1         5
Server5    567.89       2               2          0         6
Server4    456.78       0               3          0         7
Server3    345.67       1               3          1         8
Server2    234.56       2               3          2         9
Server1    123.45       0               4          2         10

回答2:

This task is scientific actually (Packing problem, or a kind of), and may be better suits math.stackexchange :)

My solution is in two steps (as many optimization problems are) - find some initial solution and try to refine it.

Initial solution:

ServerName GroupNo     TotalSizeGB
---------- ----------- -----------
Server1    3           123.45
Server2    3           234.56
Server3    2           345.67
Server4    1           456.78
Server5    2           567.89
Server6    1           678.90
Server7    3           789.01
Server8    3           890.12
Server9    1           901.23
Server10   2           1023.35

GroupNo     GroupSizeGb
----------- -----------
1           2036.91
2           1936.91
3           2037.14

Optimized:

ServerName GroupNo     TotalSizeGB
---------- ----------- -----------
Server1    3           123.45
Server2    3           234.56
Server3    2           345.67
Server4    1           456.78
Server5    3           567.89
Server6    1           678.90
Server7    2           789.01
Server8    2           890.12
Server9    1           901.23
Server10   3           1023.35

GroupNo     GroupSizeGb
----------- -----------
1           2036.91
2           2024.80
3           1949.25

Unfortunately, I was not able to set it up on SQLFiddle, because of explicit transactions are used.

set nocount on

-- Parameters
declare
  @nGroups int, -- Number of groups to split servers to
  @tolerance float, -- let say 0.0 ... 0.1 (0.1 mean that (+/-)10% deviation allowed from target group size)
  @nTries int, -- refinement tries 100, 1000, 10000 or as much as you can wait if you are not satisfied with initial solution
  @mFactor float, -- refinement param 0.0 ... 1.0
  @tolerance2 float -- let say 0.1 ... 0.3

set @nGroups = 3
set @tolerance = 0
set @nTries = 1000
set @mFactor = 0.3
set @tolerance2 = 0.3


-- Initial Data
create table #Servers (ID int identity, ServerName sysname, TotalSizeGB decimal (12,2), primary key clustered(ID))

insert into #Servers (ServerName, TotalSizeGB) values
('Server1',123.45),
('Server2',234.56),
('Server3',345.67),
('Server4',456.78),
('Server5',567.89),
('Server6',678.90),
('Server7',789.01),
('Server8',890.12),
('Server9',901.23),
('Server10',1023.35)

create table #Groups (GroupNo int not NULL, primary key clustered (GroupNo))
insert into #Groups (GroupNo)
select N from (select row_number() over (order by @@spid) from sys.all_columns) S(N) where N <= @nGroups

create table #ServerGroups (ServerID int not NULL, GroupNo int not NULL, primary key clustered(ServerID))
create index #IX_GroupServers_GroupNo on #ServerGroups (GroupNo)

declare
    @srvCnt int,
    @grSize decimal (12,2),
    @grNo int,
    @grSz decimal (12,2),
    @srvID int

select @srvCnt = count(1), @grSize = sum(TotalSizeGB) / @nGroups from #Servers
select @grSize as [Target approx. group size]

-- Find initial solution
while (select count(1) from #ServerGroups) < @srvCnt
begin
    select top 1 @grNo = g.GroupNo
    from #Groups g
        left join #ServerGroups sg on sg.GroupNo = g.GroupNo
        left join #Servers s on s.ID = sg.ServerID
    group by g.GroupNo
    order by sum(s.TotalSizeGB)

    select @grSz = IsNull(sum(s.TotalSizeGB), 0)
    from #Groups g
        left join #ServerGroups sg on sg.GroupNo = g.GroupNo
        left join #Servers s on s.ID = sg.ServerID
    where g.GroupNo = @grNo

    select top 1 @srvID = ID
    from #Servers s
    where not exists (select 1 from #ServerGroups where ServerID = s.ID)
    order by abs(@grSize - @grSz - s.TotalSizeGB)

    insert into #ServerGroups (ServerID, GroupNo) values (@srvID, @grNo)
end

select g.GroupNo, SUM(s.TotalSizeGB) GroupSizeGb
from #Groups g
    join #ServerGroups sg on sg.GroupNo = g.GroupNo
    join #Servers s on s.ID = sg.ServerID
group by g.GroupNo


-- Refinement
declare @fTarg float

select @fTarg = sum(abs(case when abs(re) > @tolerance then re else 0 end))
from (
    select g.GroupNo, SUM(s.TotalSizeGB) GroupSizeGb
    from #Groups g
        join #ServerGroups sg on sg.GroupNo = g.GroupNo
        join #Servers s on s.ID = sg.ServerID
    group by g.GroupNo
) t
cross apply (select (GroupSizeGb - @grSize)/@grSize re) p

print @fTarg

if @fTarg > 0
begin

create table #MServerGroups (ServerID int not NULL, GroupNo int not NULL, primary key clustered (ServerID))
insert into #MServerGroups
select ServerID, GroupNo from #ServerGroups

while @nTries > 0
begin
    set @nTries = @nTries - 1

    begin transaction

    ;with MS as (
        select top (100*@mFactor) percent ServerID, GroupNo
        from #MServerGroups
        order by checksum(newid())
    )
    update msg
    set
        msg.GroupNo = case when msg.ServerID = tt.ServerID1 then tt.NewNo1 else tt.NewNo2 end
    from
        #MServerGroups msg
        join (
            select ServerID1, NewNo1, ServerID2, NewNo2
            from (
                select MS.ServerID as ServerID1, SS.GroupNo as NewNo1, SS.ServerID as ServerID2, MS.GroupNo as NewNo2, row_number() over (partition by SS.ServerID order by @@spid) as rn
                from MS
                    join #Servers s on s.ID = MS.ServerID
                    cross apply (
                        select top 1 *
                        from
                            #Servers s2
                            join #MServerGroups ms2 on ms2.ServerID = s2.ID
                        where
                            s2.ID != MS.ServerID and ms2.GroupNo != MS.GroupNo and abs(s2.TotalSizeGB - s.TotalSizeGB)/s.TotalSizeGB < @tolerance2
                        order by checksum(newid())
                    ) SS
            ) t
            where rn = 1
        )tt on msg.ServerID in (tt.ServerID1, tt.ServerID2)

    if @@rowcount = 0
    begin
        rollback transaction
        continue;
    end

    declare @fT float

    select @fT = sum(abs(case when abs(re) > @tolerance then re else 0 end))
    from (
        select g.GroupNo, SUM(s.TotalSizeGB) GroupSizeGb
        from #Groups g
            join #MServerGroups sg on sg.GroupNo = g.GroupNo
            join #Servers s on s.ID = sg.ServerID
        group by g.GroupNo
    ) t
    cross apply (select (GroupSizeGb - @grSize)/@grSize re) p

    if @fT < @fTarg
    begin
        set @fTarg = @ft
        print @fTarg -- the less this number, the better solution is

        commit transaction
    end
    else
        rollback transaction
end

update s
set s.GroupNo = m.GroupNo
from #MServerGroups m
    join #ServerGroups s on s.ServerID = m.ServerID

select g.GroupNo, SUM(s.TotalSizeGB) GroupSizeGb
from #Groups g
    join #ServerGroups sg on sg.GroupNo = g.GroupNo
    join #Servers s on s.ID = sg.ServerID
group by g.GroupNo

drop table #MServerGroups

end
else
    print 'No refinement needed'

drop table #Groups
drop table #ServerGroups
drop table #Servers

I suggest to start with @nTries = 0 and reasonable @tolerance (e.g. 0.1, 0.05).

回答3:

Check this, hope this will help. i'm not that sure what you meant by 'evenly sized groups. But what i did here is first allocate an even size to a group then if any remaining then allocate it to a group where is has items more than general group size. Rather than ntile i would recommend you to decide the group numbers (might be using a sp) and allocate size for each server. but below might be fine for the described problem. and note that i didnt test for all the scenarios.

  declare @TotalSizeGB decimal;
  select  @TotalSizeGB = sum(TotalSizeGB) from @Servers;

  declare @Count int;
  select  @Count = count(TotalSizeGB) from @Servers;

  declare @GroupSize int;
  select  @GroupSize = 3;

  declare @NoofGroups int;
  select  @NoofGroups = 3;

  declare @UnitSizeGB decimal
  Set @UnitSizeGB =(@TotalSizeGB/@Count)*@NoofGroups;

  Declare @Remainder decimal;
  Set @Remainder = @TotalSizeGB-(@UnitSizeGB*@NoofGroups)  

Select GroupNumber,
   CASE 
      WHEN gcount = @GroupSize THEN @UnitSizeGB
      WHEN gcount > @GroupSize THEN @UnitSizeGB+@Remainder
   END 
 From (
 Select 
   GroupNumber,count(ServerName) as gcount,  @UnitSizeGB as UnitSizeGB from(
     Select ServerName,ntile(@GroupSize) over (order by newid()) as GroupNumber
     from (
          select ServerName, TotalSizeGB from @Servers ) x 
     group by ServerName  ) as d
     group by GroupNumber ) as ff

This will give a output

 GroupNumber      Size
 1                2405
 2                1803
 3                1803

回答4:

Here's a solution which generates the same results as @i-one's code but perhaps a little easier to understand (at least for me). I use 'chunk' instead of 'group' to avoid keyword conflicts.

The premise is as follows. To create n evenly-sized chunks:

Sort all records in decreasing order
Assign the first n records to their chunks by row number
Loop through the remainder, always assigning to the smallest chunk

I've uploaded the code to SQLFiddle but it doesn't seem to like the table variable. Here's the link anyways.

-- Source data:
DECLARE @Servers TABLE (ServerName SYSNAME, TotalSizeGB DECIMAL (12,2))
INSERT INTO @Servers VALUES
('Server1',123.45),
('Server2',234.56),
('Server3',345.67),
('Server4',456.78),
('Server5',567.89),
('Server6',678.90),
('Server7',789.01),
('Server8',890.12),
('Server9',901.23),
('Server10',1023.35)


-- Solution start
DECLARE @ServersChunked TABLE (
        ServerName SYSNAME,
        TotalSizeGB DECIMAL (12,2),
        RowNum INT,
        ChunkNo INT
    );
DECLARE
    @ChunkCount INT = 3,
    @MinRowNum INT,
    @SmallestChunk INT;


-- Copy table into variable (skip this if the original table can be amended to include the RowNum and ChunkNo fields)
INSERT INTO @ServersChunked
SELECT 
    *, 
    RowNum = ROW_NUMBER() OVER (ORDER BY TotalSizeGB DESC), 
    ChunkNo = NULL
FROM @Servers

-- Assign the initial chunks to largest tables
UPDATE @ServersChunked
SET ChunkNo = RowNum
WHERE RowNum <= @ChunkCount


-- Assign chunks to remaining tables
WHILE EXISTS (SELECT 1 FROM @ServersChunked WHERE ChunkNo IS NULL) BEGIN

    -- Find the next table (by descending row count)
    SELECT @MinRowNum = MIN(RowNum) FROM @ServersChunked WHERE ChunkNo IS NULL

    -- Find the smallest chunk
    SELECT TOP 1 @SmallestChunk = ChunkNo
    FROM @ServersChunked
    WHERE ChunkNo IS NOT NULL
    GROUP BY ChunkNo
    ORDER BY Sum(TotalSizeGB) ASC

    -- Assign the table to the chunk
    UPDATE @ServersChunked
    SET ChunkNo = @SmallestChunk
    WHERE RowNum = @MinRowNum
END

Here are the results:

ChunkNo SumTotalSizeGB
1       1936.91
2       2036.91
3       2037.14

来源：https://stackoverflow.com/questions/17889830/create-evenly-sized-groups-based-on-aggregate

标签

sql-server

sql-server-2012