Optimized algorithm to schedule tasks with dependency?

后端 未结 5 2196
梦毁少年i
梦毁少年i 2021-01-30 17:50

There are tasks that read from a file, do some processing and write to a file. These tasks are to be scheduled based on the dependency. Also tasks can be run in parallel, so the

5条回答
  •  暗喜
    暗喜 (楼主)
    2021-01-30 18:49

    Given a mapping between items, and items they depend on, a topological sort orders items so that no item precedes an item it depends upon.

    This Rosetta code task has a solution in Python which can tell you which items are available to be processed in parallel.

    Given your input the code becomes:

    try:
        from functools import reduce
    except:
        pass
    
    data = { # From: http://stackoverflow.com/questions/18314250/optimized-algorithm-to-schedule-tasks-with-dependency
        # This   <-   This  (Reverse of how shown in question)
        'B':         set(['A']),
        'C':         set(['A']),
        'D':         set(['B']),
        'F':         set(['E']),
        }
    
    def toposort2(data):
        for k, v in data.items():
            v.discard(k) # Ignore self dependencies
        extra_items_in_deps = reduce(set.union, data.values()) - set(data.keys())
        data.update({item:set() for item in extra_items_in_deps})
        while True:
            ordered = set(item for item,dep in data.items() if not dep)
            if not ordered:
                break
            yield ' '.join(sorted(ordered))
            data = {item: (dep - ordered) for item,dep in data.items()
                    if item not in ordered}
        assert not data, "A cyclic dependency exists amongst %r" % data
    
    print ('\n'.join( toposort2(data) ))
    

    Which then generates this output:

    A E
    B C F
    D
    

    Items on one line of the output could be processed in any sub-order or, indeed, in parallel; just so long as all items of a higher line are processed before items of following lines to preserve the dependencies.

提交回复
热议问题