Sort a file with huge volume of data given memory constraint

前端 未结 12 1018
暖寄归人
暖寄归人 2020-11-28 21:47

Points:

  • We process thousands of flat files in a day, concurrently.
  • Memory constraint is a major issue.
  • We use thread for each file process
12条回答
  •  情深已故
    2020-11-28 22:44

    You can do it with only two temp files - source and destination - and as little memory as you want. On first step your source is the original file, on last step the destination is the result file.

    On each iteration:

    • read from the source file into a sliding buffer a chunk of data half size of the buffer;
    • sort the whole buffer
    • write to the destination file the first half of the buffer.
    • shift the second half of the buffer to the beginning and repeat

    Keep a boolean flag that says whether you had to move some records in current iteration. If the flag remains false, your file is sorted. If it's raised, repeat the process using the destination file as a source.

    Max number of iterations: (file size)/(buffer size)*2

提交回复
热议问题