How to edit 300 GB text file (genomics data)?

后端 未结 4 1369
旧巷少年郎
旧巷少年郎 2020-12-18 06:34

I have a 300 GB text file that contains genomics data with over 250k records. There are some records with bad data and our genomics program \'Popoolution\' allows us to comm

4条回答
  •  误落风尘
    2020-12-18 07:24

    If you are required to have a person mark these records manually with a text editor, for whatever reason, you should probably use split to split the file up into manageable pieces.

    split -a4 -d -l100000 hugefile.txt part.
    

    This will split the file up into pieces with 100000 lines each. The names of the files will be part.0000, part.0001, etc. Then, after all the files have been edited, you can combine them back together with cat:

    cat part.* > new_hugefile.txt
    

提交回复
热议问题