Sort a file with huge volume of data given memory constraint

前端 未结 12 1041
暖寄归人
暖寄归人 2020-11-28 21:47

Points:

  • We process thousands of flat files in a day, concurrently.
  • Memory constraint is a major issue.
  • We use thread for each file process
12条回答
  •  庸人自扰
    2020-11-28 22:40

    Here is a way to do it without heavy use of sorting in-side Java and without using DB. Assumptions : You have 1TB space and files contain or start with unique number, but are unsorted

    Divide the files N times.

    Read those N files one by one, and create one file for each line/number

    Name that file with corresponding number.While naming keep a counter updated to store least count.

    Now you can already have the root folder of files marked for sorting by name or pause your program to give you the time to fire command on your OS to sort the files by names. You can do it programmatically too.

    Now you have a folder with files sorted with their name, using the counter start taking each file one by one, put numbers in your OUTPUT file, close it.

    When you are done you will have a large file with sorted numbers.

提交回复
热议问题