Sort a file with huge volume of data given memory constraint

前端 未结 12 1020
暖寄归人
暖寄归人 2020-11-28 21:47

Points:

  • We process thousands of flat files in a day, concurrently.
  • Memory constraint is a major issue.
  • We use thread for each file process
12条回答
  •  失恋的感觉
    2020-11-28 22:17

    In spite of your restriction, I would use embedded database SQLITE3. Like yourself, I work weekly with 10-15 millions of flat file lines and it is very, very fast to import and generate sorted data, and you only need a little free of charge executable (sqlite3.exe). For example: Once you download the .exe file, in a command prompt you can do this:

    C:> sqlite3.exe dbLines.db
    sqlite> create table tabLines(line varchar(5000));
    sqlite> create index idx1 on tabLines(line);
    sqlite> .separator '\r\n'
    sqlite> .import 'FileToImport' TabLines
    

    then:

    sqlite> select * from tabLines order by line;
    
    or save to a file:
    sqlite> .output out.txt
    sqlite> select * from tabLines order by line;
    sqlite> .output stdout
    

提交回复
热议问题