What's the fastest heuristic algorithm to split students into groups?

a 夏天 提交于 2019-12-05 13:47:16

I would start by a very simple "random search" algorithm:

start from a random solution (a partition of X to groups), call it S[0]

score[0] = black_box_socre(S[0])

i = 0

while (some condition):
    i++
    S[i] = some small permutation on S[i-1]  # (1)
    score[i] = black_box_score(S[i])
    if score[i] < score[i-1]:  # (2)  
        S[i] = S[i-1]
        score[i] = score[i-1]

(1) - small permutation could be in your case, switching 2 people between groups.

(2) - If we made a change that made our solution worse (lower score) we reject it. You can later replace this with also accepting worse solutions with some probability, to make this algorithm into simulated annealing.

Start by simply running this for 1000 iteration or so, and plot score[i] as a function of i, to get a feeling of how fast your solution is improving. Run this several times (to try different random starting points).

Then you can play with different permutations (1), make the algorithm less greedy (2), or add some fancy automatic logic to stop the search (e.g., no progress in the last T iterations).

Let's take the example of 36 students distributed into 6 groups. Checking all combinations is impractical, because there are 3,708,580,189,773,818,399,040. However, a strategy which makes repeated improvements by checking every distribution of students between pairs of groups should be feasible.

There are 462 ways to split 12 students into 2 groups, so finding the optimal 12→2 distribution takes only 924 calls to the "group quality" function. There are 15 possible pairings of groups among 6 groups, so 13,860 calls will reveal the best way to pair the groups and redistribute the students between the pairs to get the most improvement.

Starting with a random initial distribution, the algorithm calculates the optimal distribution for all 15 pairings of groups: AB,CD,EF,BC,DE,FA,AC,BD,CE,DF,EA,FB,AD,BE,CF.

It then compares the scores for all 15 combinations of pairs, to find the combination with the highest overal score, e.g. DE+AC+FB.

It then redistributes the students, and returns the new overall score. This constitutes one improvement step. This process is then repeated a number of times, until no more improvement can be found, or until you run out of time. It may also be useful to run the algorithm several times, starting with different random initial distributions.

This algorithm can be fine-tuned in both the pairing and the combination of pairings phase. When optimizing a pair of groups, you'll have to choose e.g. whether a distribution of the students over the two groups that increases the score of the one group by +4 but decreases the score of the other group by -1, for a combined improvement of +3, is preferable over a distribution where both groups increase their score by +1, for a combined improvement of only +2.

And again in the combinations of pairs phase, you'll have to decide whether an improvement of all three pairs is required, or whether you choose the combinations with the highest combined improvement.

I assume that allowing a group to have a lower score after a step if that improves the overall score, will allow for more movement of the students between the groups, and may lead to more combinations being explored.


To be able to write code to test this strategy, a dummy "group quality" function is needed, so I'm numbering the students from 1 to 36 and using a function which multiplies the distance between adjacent students' numbers. So e.g. the group [2,7,15,16,18,30] would have score 5*8*1*2*12 = 960. If you imagine the numbering to be a ranking of the students' ability, then a high-quality group means a mixed-ability group. The optimal distribution is:

group A: [1,  7, 13, 19, 25, 31]
group B: [2,  8, 14, 20, 26, 32]
group C: [3,  9, 15, 21, 27, 33]
group D: [4, 10, 16, 22, 28, 34]
group E: [5, 11, 17, 23, 29, 35]
group F: [6, 12, 18, 24, 30, 36]

with every group scoring 6*6*6*6*6 = 7776 and a total score of 46656. In practice I found that using Log(score) gave better results, because it favours small improvements across all groups over large improvements to one or two groups. (Favouring improvements to several groups, or to the lowest-quality groups, or just choosing the best overall improvement, is the part you'll have to fine-tune to your specific "group quality" function.)


To my surprise, the algorithm always manages to find the optimal solution, and in just 4 to 7 steps, which means that less than 100,000 "group quality" function calls are made. The "group quality" algorithm I'm using is of course quite simple, so you'd have to check it with the real thing to gauge the usefulness of this approach in your specific case. But it's clear that this algorithm manages to thoroughly rearrange the distribution in just a few steps.

(The code example below is hard-coded for the case of 36 students and 6 groups for simplicity. The sorting of students in each group is done to simplify the quality function.)

function improve(groups) {
    var pairs = [[0,1],[0,2],[0,3],[0,4],[0,5],[1,2],[1,3],[1,4],[1,5],[2,3],[2,4],[2,5],[3,4],[3,5],[4,5]];
    var combi = [[0,9,14],[0,10,13],[0,11,12],[1,6,14],[1,7,13],[1,8,12],[2,5,14],[2,7,11],[2,8,10],[3,5,13],[3,6,11],[3,8,9],[4,5,12],[4,6,10],[4,7,9]];
    // FIND OPTIMAL DISTRIBUTION FOR ALL PAIRS OF GROUPS
    var optim = [];
    for (var i = 0; i < 15; i++) {
        optim[i] = optimise(groups[pairs[i][0]], groups[pairs[i][1]]);
    }
    // FIND BEST COMBINATION OF PAIRS
    var best, score = -1;
    for (var i = 0; i < 15; i++) {
        var current = optim[combi[i][0]].score + optim[combi[i][1]].score + optim[combi[i][2]].score;
        if (current > score) {
            score = current;
            best = i;
        }
    }
    // REDISTRIBUTE STUDENTS INTO GROUPS AND RETURN NEW SCORE
    groups[0] = optim[combi[best][0]].group1.slice();
    groups[1] = optim[combi[best][0]].group2.slice();
    groups[2] = optim[combi[best][1]].group1.slice();
    groups[3] = optim[combi[best][1]].group2.slice();
    groups[4] = optim[combi[best][2]].group1.slice();
    groups[5] = optim[combi[best][2]].group2.slice();
    return score;
}

// FIND OPTIMAL DISTRIBUTION FOR PAIR OF GROUPS
function optimise(group1, group2) {
    var optim = {group1: [], group2: [], score: -1};
    var set = group1.concat(group2).sort(function(a, b) {return a - b});
    var distr = [0,0,0,0,0,1,1,1,1,1,1];
    // TRY EVERY COMBINATION
    do {
        // KEEP FIRST STUDENT IN FIRST GROUP TO AVOID SYMMETRIC COMBINATIONS
        var groups = [[set[0]], []];
        // DISTRIBUTE STUDENTS INTO GROUP 0 OR 1 ACCORDING TO BINARY ARRAY
        for (var j = 0; j < 11; j++) {
            groups[distr[j]].push(set[j + 1]);
        }
        // CHECK SCORE OF GROUPS AND STORE IF BETTER
        var score = quality(groups[0]) + quality(groups[1]);
        if (score > optim.score) {
            optim.group1 = groups[0].slice();
            optim.group2 = groups[1].slice();
            optim.score = score;
        }
    } while (increment(distr));
    return optim;

    // GENERATE NEXT PERMUTATION OF BINARY ARRAY
    function increment(array) {
        var digit = array.length, count = 0;
        while (--digit >= 0) {
            if (array[digit] == 1) ++count
            else if (count) {
                array[digit] = 1;
                for (var i = array.length - 1; i > digit; i--) {
                    array[i] = --count > 0 ? 1 : 0;
                }
                return true;
            }
        }
        return false;
    }
}

// SCORE FOR ONE GROUP ; RANGE: 0 ~ 8.958797346140275
function quality(group) {
    // LOGARITHM FAVOURS SMALL IMPROVEMENTS TO ALL GROUPS OVER LARGE IMPROVEMENT TO ONE GROUP
    return Math.log((group[5] - group[4]) * (group[4] - group[3]) * (group[3] - group[2]) * (group[2] - group[1]) * (group[1] - group[0]));
}

// SUM OF SCORES FOR ALL 6 GROUPS ; RANGE: 0 ~ 53.75278407684165
function overallQuality(groups) {
    var score = 0;
    for (var i = 0; i < 6; i++) score += quality(groups[i]);
    return score;
}

// PREPARE RANDOM TEST DATA
var students = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36];
var groups = [[],[],[],[],[],[]];
for (var i = 5; i >=0; i--) {
    for (var j = 5; j >= 0; j--) {
        var pick = Math.floor(Math.random() * (i * 6 + j));
        groups[i].push(students[pick]);
        students[pick] = students[i * 6 + j];
    }
    groups[i].sort(function(a, b) {return a - b});
}

// DISPLAY INITIAL SCORE AND DISTRIBUTION
var score = overallQuality(groups);
document.write("<PRE>Initial: " + score.toFixed(2) + " " + JSON.stringify(groups) + "<BR>");

// IMPROVE DISTRIBUTION UNTIL SCORE NO LONGER INCREASES
var prev, step = 0;
do {
    prev = score;
    score = improve(groups);
    document.write("Step " + ++step + " : " + score.toFixed(2) + " " + JSON.stringify(groups) + "<BR>");
} while (score > prev && score < 53.75278407684165);
if (score >= 53.75278407684165) document.write("Optimal solution reached.</PRE>");

Note: after having chosen the best combination of pairs and having redistributed the students in those pairs of groups, you of course know that those three pairs now have their optimal distribution of students. So you can skip checking those three pairs in the following step, and use their current score as the optimal score.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!