Where does this greedy scheduling algorithm become sub-optimal?

问题

This question is inspired by another question on process allocation. This is a twist on, and enhancement of, the problem discussed there.

We have n processes, and we need to allocate them to the fewest possible processors. Each process has a scheduled start and finish time, which we will define in terms of time units, indexed from 1; a process will run for some contiguous sequence of time units. A processor can then be scheduled to run any number of non-overlapping processes.

The obvious greedy algorithm is:

At each step, schedule a maximal non-overlapping set of the remaining processes onto the next processor.

How is a maximal non-overlapping set to be chosen? We will leave the algorithm as non-deterministic, since that makes it easier to analyse and split into two sub-questions.

Essentially, the previous question concerned behaviour if the algorithm is unlucky: what's the smallest n for which this algorithm might produce a sub-optimal allocation (i.e., require more processors than necessary)? It turns out that the answer is n=4. There are cases where two processors are sufficient, but the greedy algorithm might require three processors (though if it's lucky, it will do it in two steps).

This question concerns what happens if the non-determinism is always lucky:

What's the smallest n for which this algorithm is guaranteed to be sub-optimal? That is, what's the smallest n for which we can find a set of processes where the greedy algorithm must use more processes than necessary?

Since Stack Overflow actively encourages asking and answering your own question, I will do so here, and you can tell me whether my partial answer can be improved upon!

回答1:

I think that the answer is n=7. I will try to demonstrate that things go wrong with n=7; but this only gives an upper bound. Can things go wrong with n=6? I don't think so, but I'm not sure.

Upper bound

Suppose that we have the following processes to schedule:

[1] (i.e., the process runs during time unit 1)
[2]
[4]
[5]
[1,2,3] (i.e., the process runs during time units 1 to 3)
[2,3,4]
[3,4,5]

The optimal allocation looks to require three processors:

[1], [2,3,4]
[2], [3,4,5]
[1,2,3], [4], [5]

But the greedy algorithm will produce the following:

[1], [2], [4], [5]
[1,2,3]
[2,3,4]
[3,4,5]

The key point here is that it'll schedule four on the first step (because it's greedy), and then after that we're left with three pairwise overlapping processes.

Lower bound

Can things go wrong with n=6? I don't think so, but I am not certain. It looks to me as though the relevant case will be where the optimal solution requires three processors, each running two processes. Can we find a case where the greedy algorithm schedules three on the first processor, and is then left with three overlapping processes, thus requiring four processors in total?

We'll need three pairs, for the optimal solution; but we'll need these processes to be constructed in such a way that if you schedule three of them on the first step, you can't then complete in two steps. Clearly these three processes can't include one of the pairs, or else the solution can continue as it would have done with the pairs, but leaving one of them as a singleton. So it must take one from each of the pairs.

If a process could require non-contiguous time blocks, we could do it like this:

[1]
[2]
[3]
[1,2]
[2,3]
[1,3] (this isn't allowed, because it stops and starts again!)

The optimal solution would pair each singleton with its complement; the greedy algorithm would take all the singletons together on the first step, and then the pairwise overlapping processes we're left with would require three more processors. But we've cheated, because the last process listed above doesn't run for a contiguous time block.

We can't change the last one to [3,4], because then it can be run on the same processor as [1,2]. In fact, we can't have three pairwise overlapping contiguous blocks unless their intersection is non-empty. So we'd end up with something like [1,2,3], [2,3,4], [3,4,5], as we had with the seven process case. The problem is that it then seems to be impossible to add three more processes that can be scheduled together, and still allow a three-processor optimal solution, without allowing for the possibility of scheduling two of the singletons with one of the triples. If we try

[1]
[2]
[4]
[1,2,3]
[2,3,4]
[3,4,5]

then the greedy algorithm might schedule [1], [2], [3,4,5] on the first step, which will lead to an optimal solution (and we're looking for a case where the non-determinism is guaranteed to lead to a sub-optimal solution).

If we try

[2]
[3]
[4]
[1,2,3]
[2,3,4]
[3,4,5]

then we don't have an optimal solution on three processors, because four of the processes require time unit 3.

All of this musing suggests to me that the greedy algorithm will be optimal in the case of n=6, and that the upper bound of n=7 is therefore strict; but it falls rather short of a proof. How can we prove that it's always optimal for n=6?

来源：https://stackoverflow.com/questions/26444122/where-does-this-greedy-scheduling-algorithm-become-sub-optimal

标签

algorithm

math

greedy

bin-packing