What's the fastest heuristic algorithm to split students into groups?

I have X number of students, where X is a multiple of 6. I now want to split up the students into groups of 6.

I have a function that measures how "good" a group of 6 is (lets say it's a black box that runs in constant time for now). By splitting up the students, and then calling my function on each group to measure it's goodness, and then summing up the goodness of each group, I'm able to measure how "good" a certain set of groups is.

I'm trying to create an algorithm that will group the students in a way so that the total goodness of all the groups is maximized, and no group has an individual goodness below some value y. In other words, group the students into groups of 6 to maximize total goodness under the constraint that all groups have a goodness above y.

The number of students (X) I expect to run this algorithm on is about ~36.

The problem appears to be NP-Complete, so I'm okay with settling for a heuristic algorithm. I don't have a lot of experience with this, but some sort of genetic algorithm or simulated annealing or even a greedy algorithm might work I would think, but I'm not sure where to start my research.

Could someone point me in the right direction please? I've done some research, and the problem seems almost identical to the Travelling Salesman Problem (the problem space is all permutations of the students/nodes) but I don't think I can apply TSP algorithms to this because the number of "nodes" (around 36) would be quite large for anything to be effective.

I would start by a very simple "random search" algorithm:

start from a random solution (a partition of X to groups), call it S[0]

score[0] = black_box_socre(S[0])

i = 0

while (some condition):
    i++
    S[i] = some small permutation on S[i-1]  # (1)
    score[i] = black_box_score(S[i])
    if score[i] < score[i-1]:  # (2)  
        S[i] = S[i-1]
        score[i] = score[i-1]

(1) - small permutation could be in your case, switching 2 people between groups.

(2) - If we made a change that made our solution worse (lower score) we reject it. You can later replace this with also accepting worse solutions with some probability, to make this algorithm into simulated annealing.

Start by simply running this for 1000 iteration or so, and plot score[i] as a function of i, to get a feeling of how fast your solution is improving. Run this several times (to try different random starting points).

Then you can play with different permutations (1), make the algorithm less greedy (2), or add some fancy automatic logic to stop the search (e.g., no progress in the last T iterations).

Let's take the example of 36 students distributed into 6 groups. Checking all combinations is impractical, because there are 3,708,580,189,773,818,399,040. However, a strategy which makes repeated improvements by checking every distribution of students between pairs of groups should be feasible.

There are 462 ways to split 12 students into 2 groups, so finding the optimal 12→2 distribution takes only 924 calls to the "group quality" function. There are 15 possible pairings of groups among 6 groups, so 13,860 calls will reveal the best way to pair the groups and redistribute the students between the pairs to get the most improvement.

Starting with a random initial distribution, the algorithm calculates the optimal distribution for all 15 pairings of groups: AB,CD,EF,BC,DE,FA,AC,BD,CE,DF,EA,FB,AD,BE,CF.

It then compares the scores for all 15 combinations of pairs, to find the combination with the highest overal score, e.g. DE+AC+FB.

It then redistributes the students, and returns the new overall score. This constitutes one improvement step. This process is then repeated a number of times, until no more improvement can be found, or until you run out of time. It may also be useful to run the algorithm several times, starting with different random initial distributions.

This algorithm can be fine-tuned in both the pairing and the combination of pairings phase. When optimizing a pair of groups, you'll have to choose e.g. whether a distribution of the students over the two groups that increases the score of the one group by +4 but decreases the score of the other group by -1, for a combined improvement of +3, is preferable over a distribution where both groups increase their score by +1, for a combined improvement of only +2.

And again in the combinations of pairs phase, you'll have to decide whether an improvement of all three pairs is required, or whether you choose the combinations with the highest combined improvement.

I assume that allowing a group to have a lower score after a step if that improves the overall score, will allow for more movement of the students between the groups, and may lead to more combinations being explored.

To be able to write code to test this strategy, a dummy "group quality" function is needed, so I'm numbering the students from 1 to 36 and using a function which multiplies the distance between adjacent students' numbers. So e.g. the group [2,7,15,16,18,30] would have score 5*8*1*2*12 = 960. If you imagine the numbering to be a ranking of the students' ability, then a high-quality group means a mixed-ability group. The optimal distribution is:

group A: [1,  7, 13, 19, 25, 31]
group B: [2,  8, 14, 20, 26, 32]
group C: [3,  9, 15, 21, 27, 33]
group D: [4, 10, 16, 22, 28, 34]
group E: [5, 11, 17, 23, 29, 35]
group F: [6, 12, 18, 24, 30, 36]

with every group scoring 6*6*6*6*6 = 7776 and a total score of 46656. In practice I found that using Log(score) gave better results, because it favours small improvements across all groups over large improvements to one or two groups. (Favouring improvements to several groups, or to the lowest-quality groups, or just choosing the best overall improvement, is the part you'll have to fine-tune to your specific "group quality" function.)

To my surprise, the algorithm always manages to find the optimal solution, and in just 4 to 7 steps, which means that less than 100,000 "group quality" function calls are made. The "group quality" algorithm I'm using is of course quite simple, so you'd have to check it with the real thing to gauge the usefulness of this approach in your specific case. But it's clear that this algorithm manages to thoroughly rearrange the distribution in just a few steps.

(The code example below is hard-coded for the case of 36 students and 6 groups for simplicity. The sorting of students in each group is done to simplify the quality function.)

function improve(groups) {
    var pairs = [[0,1],[0,2],[0,3],[0,4],[0,5],[1,2],[1,3],[1,4],[1,5],[2,3],[2,4],[2,5],[3,4],[3,5],[4,5]];
    var combi = [[0,9,14],[0,10,13],[0,11,12],[1,6,14],[1,7,13],[1,8,12],[2,5,14],[2,7,11],[2,8,10],[3,5,13],[3,6,11],[3,8,9],[4,5,12],[4,6,10],[4,7,9]];
    // FIND OPTIMAL DISTRIBUTION FOR ALL PAIRS OF GROUPS
    var optim = [];
    for (var i = 0; i < 15; i++) {
        optim[i] = optimise(groups[pairs[i][0]], groups[pairs[i][1]]);
    }
    // FIND BEST COMBINATION OF PAIRS
    var best, score = -1;
    for (var i = 0; i < 15; i++) {
        var current = optim[combi[i][0]].score + optim[combi[i][1]].score + optim[combi[i][2]].score;
        if (current > score) {
            score = current;
            best = i;
        }
    }
    // REDISTRIBUTE STUDENTS INTO GROUPS AND RETURN NEW SCORE
    groups[0] = optim[combi[best][0]].group1.slice();
    groups[1] = optim[combi[best][0]].group2.slice();
    groups[2] = optim[combi[best][1]].group1.slice();
    groups[3] = optim[combi[best][1]].group2.slice();
    groups[4] = optim[combi[best][2]].group1.slice();
    groups[5] = optim[combi[best][2]].group2.slice();
    return score;
}

// FIND OPTIMAL DISTRIBUTION FOR PAIR OF GROUPS
function optimise(group1, group2) {
    var optim = {group1: [], group2: [], score: -1};
    var set = group1.concat(group2).sort(function(a, b) {return a - b});
    var distr = [0,0,0,0,0,1,1,1,1,1,1];
    // TRY EVERY COMBINATION
    do {
        // KEEP FIRST STUDENT IN FIRST GROUP TO AVOID SYMMETRIC COMBINATIONS
        var groups = [[set[0]], []];
        // DISTRIBUTE STUDENTS INTO GROUP 0 OR 1 ACCORDING TO BINARY ARRAY
        for (var j = 0; j < 11; j++) {
            groups[distr[j]].push(set[j + 1]);
        }
        // CHECK SCORE OF GROUPS AND STORE IF BETTER
        var score = quality(groups[0]) + quality(groups[1]);
        if (score > optim.score) {
            optim.group1 = groups[0].slice();
            optim.group2 = groups[1].slice();
            optim.score = score;
        }
    } while (increment(distr));
    return optim;

    // GENERATE NEXT PERMUTATION OF BINARY ARRAY
    function increment(array) {
        var digit = array.length, count = 0;
        while (--digit >= 0) {
            if (array[digit] == 1) ++count
            else if (count) {
                array[digit] = 1;
                for (var i = array.length - 1; i > digit; i--) {
                    array[i] = --count > 0 ? 1 : 0;
                }
                return true;
            }
        }
        return false;
    }
}

// SCORE FOR ONE GROUP ; RANGE: 0 ~ 8.958797346140275
function quality(group) {
    // LOGARITHM FAVOURS SMALL IMPROVEMENTS TO ALL GROUPS OVER LARGE IMPROVEMENT TO ONE GROUP
    return Math.log((group[5] - group[4]) * (group[4] - group[3]) * (group[3] - group[2]) * (group[2] - group[1]) * (group[1] - group[0]));
}

// SUM OF SCORES FOR ALL 6 GROUPS ; RANGE: 0 ~ 53.75278407684165
function overallQuality(groups) {
    var score = 0;
    for (var i = 0; i < 6; i++) score += quality(groups[i]);
    return score;
}

// PREPARE RANDOM TEST DATA
var students = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36];
var groups = [[],[],[],[],[],[]];
for (var i = 5; i >=0; i--) {
    for (var j = 5; j >= 0; j--) {
        var pick = Math.floor(Math.random() * (i * 6 + j));
        groups[i].push(students[pick]);
        students[pick] = students[i * 6 + j];
    }
    groups[i].sort(function(a, b) {return a - b});
}

// DISPLAY INITIAL SCORE AND DISTRIBUTION
var score = overallQuality(groups);
document.write("<PRE>Initial: " + score.toFixed(2) + " " + JSON.stringify(groups) + "<BR>");

// IMPROVE DISTRIBUTION UNTIL SCORE NO LONGER INCREASES
var prev, step = 0;
do {
    prev = score;
    score = improve(groups);
    document.write("Step " + ++step + " : " + score.toFixed(2) + " " + JSON.stringify(groups) + "<BR>");
} while (score > prev && score < 53.75278407684165);
if (score >= 53.75278407684165) document.write("Optimal solution reached.</PRE>");

Note: after having chosen the best combination of pairs and having redistributed the students in those pairs of groups, you of course know that those three pairs now have their optimal distribution of students. So you can skip checking those three pairs in the following step, and use their current score as the optimal score.

来源：https://stackoverflow.com/questions/34570039/whats-the-fastest-heuristic-algorithm-to-split-students-into-groups

标签

algorithm

computer-science

genetic-algorithm

greedy

simulated-annealing