There are tasks that read from a file, do some processing and write to a file. These tasks are to be scheduled based on the dependency. Also tasks can be run in parallel, so the
Given a mapping between items, and items they depend on, a topological sort orders items so that no item precedes an item it depends upon.
This Rosetta code task has a solution in Python which can tell you which items are available to be processed in parallel.
Given your input the code becomes:
try:
from functools import reduce
except:
pass
data = { # From: http://stackoverflow.com/questions/18314250/optimized-algorithm-to-schedule-tasks-with-dependency
# This <- This (Reverse of how shown in question)
'B': set(['A']),
'C': set(['A']),
'D': set(['B']),
'F': set(['E']),
}
def toposort2(data):
for k, v in data.items():
v.discard(k) # Ignore self dependencies
extra_items_in_deps = reduce(set.union, data.values()) - set(data.keys())
data.update({item:set() for item in extra_items_in_deps})
while True:
ordered = set(item for item,dep in data.items() if not dep)
if not ordered:
break
yield ' '.join(sorted(ordered))
data = {item: (dep - ordered) for item,dep in data.items()
if item not in ordered}
assert not data, "A cyclic dependency exists amongst %r" % data
print ('\n'.join( toposort2(data) ))
Which then generates this output:
A E
B C F
D
Items on one line of the output could be processed in any sub-order or, indeed, in parallel; just so long as all items of a higher line are processed before items of following lines to preserve the dependencies.