Algorithm for fairly assigning tasks to workers based on skills

前端 未结 4 912
小鲜肉
小鲜肉 2021-02-02 00:11

(Before anyone asks, this is not homework.)

I have a set of workers with interests, i.e.:

  • Bob: Java, XML, Ruby

  • Susan: Java, HTML, Python

相关标签:
4条回答
  • 2021-02-02 00:15

    So I gave this problem some thought and I think that you can get a good solution (for some definition of "good") by reducing it to an instance of min-cost max-flow (see this, for example). The idea is as follows. Suppose you are given as input a set of jobs J, each of which has a set of skills necessary, along with a set of workers W, each of whom has a set of talents. You are also given for each worker a constant k_i saying how many jobs you'd like them to do, as well as a constant m_i saying the maximum number of jobs you can allocate to them. Your goal is to assign the jobs to the workers in such a way that each job is done by a worker who has the skills, no worker does more than m_i jobs, and the number of the "excess" jobs done by the workers is minimized. For example, if the re are five workers who each want to do four tasks and the load is balanced so that two workers do four jobs, one does three, and one does five, the total excess is one, since one worker did one more job than was expected.

    The reduction is as follows. For now, we'll ignore the balancing requirement and just see how tom reduce this to max-flow; we'll add load balancing at the end. Construct a graph G with a designated start node s and sink node t. Add to this graph a node for each job j and each worker w. There will be an edge from s to each of these j nodes of cost zero and capacity one. There will also be an edge from each w node to t with cost zero and capacity m_i. Finally, for each job j and worker w, if worker w has the talents necessary to complete job j, there is an edge from j to w with cost zero and capacity one.

    The idea is that we want to push flow from s to t through the j and w nodes such that each flow path going through some j node to a w node means that job j should be given to worker w. The capacity restrictions on the edges from s to j nodes ensures that at most one unit of flow enters the j node, so the job is only assigned at most once. The capacity restriction on the edges from the w nodes to the node t prevent each worker from being assigned too many times. Since all capacities are integral, an integral max flow exists from s to t, and so a max-flow in this graph corresponds to an assignments of jobs to workers that is legal and doesn't exceed any worker's maximum load. You can check whether all jobs are assigned by looking at the total flow in the graph; if it's equal to the number of jobs, they've all been assigned.

    This above construction, however, does nothing to balance worker loads. To fix this, we'll modify the construction a bit. Rather than having an edge from each w node to t, instead, for each w node, add two nodes to the graph, c and e, and connect them as follows. There is an edge from w_i to c_i with capacity k_i and cost zero, and an identical edge from c_i to t. There is also an edge from w_i to e_i with cost 1 and capacity m_i - k_i. There is also an edge from e_i to t with equal capacity and zero cost.

    Intuitively, we haven't changed the amount of flow that leaves any w node, but we have changed how much that flow costs. Flow shunted to t via the c node is free, and so the worker can take on k_i jobs without incurring cost. Any jobs after that have to be routed through e, which costs one for each unit of flow crossing it. Finding a max-flow in this new graph still determines an assignment, but finding the min-cost max-flow in the graph finds the assignment that minimizes the excess jobs divvied up to workers.

    Min-cost max flows can be solved in polynomial time with a few somewhat-well-known algorithms, so hopefully this is a useful answer!

    0 讨论(0)
  • 2021-02-02 00:17

    This problem can be modeled as a Maximum Flow Problem.

    In a max-flow problem, you have a directed graph with two special nodes, the source and the sink. The edges in the graph have capacities, and your goal is to assign a flow through the graph from the source to the sink without exceeding any of the edge capacities.

    With a (very) carefully crafted graph, we can find an assignment meeting your requirements from the maximum flow.

    Let me number the requirements.

    Required:

    1. Workers are assigned no more than their maximum assignments.
    2. Tasks can only be assigned to workers that match one of the task's interests.
    3. Every task must be assigned to 3 workers.
    4. Every worker must be assigned to at least 1 task.
    

    Optional:

    5. Each worker should be assigned a number of tasks proportional to that worker's maximum assignments
    6. Each worker should be assigned a variety of tasks.
    

    I will assume that the maximum flow is found using the Edmonds-Karp Algorithm.

    Let's first find a graph that meets requirements 1-3.

    Picture the graph as 4 columns of nodes, where edges only go from nodes in a column to nodes in the neighboring column to the right.

    In the first column we have the source node. In the next column we will have nodes for each of the workers. From the source, there is an edge to each worker with capacity equal to that worker's maximum assignments. This will enforce requirement 1.

    In the third column, there is a node for each task. From each worker in the second column there is an edge to each task that that worker is interested in with a capacity of 1 (a worker is interested in a task if the intersection of their interests is non-empty). This will enforce requirement 2. The capacity of 1 will ensure that each worker takes only 1 of the 3 slots for each task.

    In the fourth column we have the sink. There is an edge from each task to the sink with capacity 3. This will enforce requirement 3.

    Now, we find a maximum flow in this graph using the Edmonds-Karp Algorithm. If this maximum flow is less than 3 * (# of tasks) then there is no assignment meeting requirements 1-3. If not, there is such an assignment and we can find it by examining the final augmented graph. In the augmented graph, if there is an edge from a task to a worker with capacity 1, then that worker is assigned to that task.


    Now, we will modify our graph and algorithm to meet the rest of the requirements.

    First, let's meet requirement 4. This will require a small change to the algorithm. Initially, set all the capacities from the source to the workers to 1. Find the max-flow in this graph. If the flow is not equal to the number of workers, then there is no assignment meeting requirement 4. Now, in your final residual graph, for each worker the edge from the source to that worker has capacity 0 and the reverse edge has capacity 1. Change these to that worker's maximum assignments - 1 and 0, respectively. Now continue Edmonds-Karp algorithm as before. Basically what we have done is first find an assignment such that each worker is assigned to exactly one task. Then delete the reverse edge from that task so that the worker will always be assigned to at least one task(though it may not be the one assigned to in the first pass).


    Now let's meet requirement 5. Strictly speaking, this requirement just means that we divide each worker's maximum assignments by sum of all worker's maximum assignments / number of tasks. This will quite possibly not have a satisfying assignment. But that's ok. Initialize our graph with these new maximum assignments. Run Edmonds-Karp. If it finds a flow that saturates the edges from tasks to sink, we are done. Otherwise we can increment the capacities from sink to workers in the residual graph and continue running Edmonds-Karp. Repeat until we saturate the edges into the sink. Don't increment the capacities so much that a worker is assigned too many tasks. Also, technically, the increment for each worker should be proportional to that worker's maximum assignments. These are both easy to do.


    Finally let's meet requirement 6. This one is a bit tricky. First, add a column between workers and tasks and remove all edges from workers to tasks. In this new column, for each worker add a node for each of that workers interests. From each of these new nodes, add an edge to each task with a matching interest with capacity 1. Add an edge from each worker to each of its interest nodes with capacity 1. Now, a flow in this graph would enforce that if a worker is assigned to n tasks, then the intersection of the union of those task's interests with that worker's interests has size at least n. Again, it is possible that there is a satisfying assignment without this assignment, but there is not one with it. We can handle this the same as requirement 5: run Edmonds-Karp to completion, if no satisfying assignment, increment the capacities from workers to their interest nodes and repeat.

    Note that in this modified graph we no longer satisfy requirement 3, as a single worker may be assigned to multiple/all slots of a task if the intersection of their interests has size greater than 1. We can fix that. Add a new column of nodes between the interest nodes and the task nodes and delete the edges between those nodes. For each employee, in the new column insert a node for each task (so each employee has its own node for each task). From these new nodes, to their corresponding task to the right, add an edge with capacity 1. From each worker's interests node to that worker's task nodes, add an edge with capacity 1 from each interest to each task that matches.

    -

    EDIT: Let me try to clarify this a little. Let -(n)-> be an edge with n capacity.

    Previously we had worker-(1)->task for each worker-task pair with a matching interest. Now we have worker-(k)->local interest-(1)->local task-(1)->global task. Now, you can think of a task being matched to a worker-interest pair. The first edge says that for a worker, each of its interests can be matched to k tasks. The second edge says that each of a worker's interests can only be matched once to each job. The third edge says that each task can only be assigned once to each worker. Note that you could push multiple flow from the worker to a local task (equal to the size of the intersection of their interests) but only 1 flow from the worker to the global task node due to the third edge.

    -

    Also note that we can't really mix this incrementing with the one for requirement 5 correctly. However, we can run the whole algorithm once for each capacity {1,2,...,r} for worker->interest edges. We then need a way to rank the assignments. That is, as we relax requirement 5 we can better meet requirement 6 and vice versa. However, there is another approach that I prefer for relaxing these constraints.


    A better approach to requirement relaxation (inspired-by/taken-from templatetypedef)

    When we want to be able to relax multiple requirements (e.g. 5 and 6), we can model it as a min-cost max-flow problem. This may be simpler than the incremental search that I described above.

    For example, for requirement 5, set all the edge costs to 0. We have the initial edge from the source to the worker with the capacity equal to worker's maximum assignments / (sum of all worker's maximum assignments / number of tasks) and with cost 0. Then you can add another edge with the remaining capacity for that worker and cost 1. Another possibility would be to use some sort of progressive cost such that as you add tasks to a worker the cost to add another task to that user goes up. E.g. you could instead split a worker's remaining capacity up into individual edges with costs 1,2,3,4,....

    A similar thing could then be done between the worker nodes and the local-interest nodes for requirement 6. The weighting would need to be balanced to reflect the relative importance of the different requirements.

    This method is also sufficient to enforce requirement 4. Also, the costs for requirement 5 should probably be made such that they are proportional to a worker's maximum assignments. Then assigning 1 extra task to a worker with max 100 would not cost as much as assigning an extra to a worker with max 2.


    Complexity Analysis

    Let n = # of employees, m = # of tasks, k = max interests for a single task/worker, l = # of interests, j = maximum of maximum assignments.

    Requirement 3 implies that n = O(m). Let's also assume that l = O(m) and j = O(m).

    In the smaller graph (before the change for req. 6), the graph has n + m + 2 = O(m) vertices and at most n + m + k*min(n, m) = O(km) edges.

    After the change it has 2 + n + n * l + n * m + m = O(nm) vertices and n + k * n + k * m * n + n * m + m = O(kmn) edges (technically we may need j * n + j * l more nodes and edges so that there are not multiple edges from one node to another, but this wouldn't change the asymptotic bound). Also note that no edge need have capacity > j.

    Using the min-cost max-flow formulation, we can find a solution in O(VEBlogV) where V = # vertices, E = # edges, and B = max capacity on a single edge. In our case this gives O(kjn^2m^2log(nm)).

    0 讨论(0)
  • 2021-02-02 00:38

    Try mapping your task to the stable marriage problem. Tasks become prospective wives `, and your staff become suitors.

    You might want to add some extra algorithm for assigning preferences of each task to the staff, and vice-versa - you could assign some ideal proficiency neccessary for the components of each task, and then allow your staff to rank each task. You could assign a proficiency for each component that each staff member posses and use that to get each tasks preference in staff members.

    Once you have the preferences then run the algorithm, post the results, then allow people to apply in pairs to you to swap assignments - after all this is a people problem and people work better when they have a degree of control.

    0 讨论(0)
  • 2021-02-02 00:39

    For problems where finding a direct solution is difficult it can be a good idea to use an approximation algorithm, an evaulation function and a method to improve the solution. There are a variety of approaches, such as genetic algorithms and simulated annealing.

    The basic idea is to use some sort of simple algorithm (such as a greedy algorithm) to get something that is vaguely usable and make random modifications, keeping those modifications that improve the evaluation score and discarding those that make it worse.

    With genetic algorithms a group of for example 100 random solutions is generated and scored and the best are kept and "bred" to produce a new generation of solutions with characteristics similar to the previous generations, but with some random mutations.

    For simulated annealing the probablility of a slightly worse solutions being accepted is high initially, but decreases over time. This reduces the risk of getting stuck at a local optimium early on.

    0 讨论(0)
提交回复
热议问题