I have to deal with a directory of about 2 million xml\'s to be processed.
I\'ve already solved the processing distributing the work between machines and threads us
Why do you store 2 million files in the same directory anyway? I can imagine it slows down access terribly on the OS level already.
I would definitely want to have them divided into subdirectories (e.g. by date/time of creation) already before processing. But if it is not possible for some reason, could it be done during processing? E.g. move 1000 files queued for Process1 into Directory1, another 1000 files for Process2 into Directory2 etc. Then each process/thread sees only the (limited number of) files portioned for it.