How to edit 300 GB text file (genomics data)?

后端未结

关注

 4  1369

旧巷少年郎 2020-12-18 06:34

I have a 300 GB text file that contains genomics data with over 250k records. There are some records with bad data and our genomics program \'Popoolution\' allows us to comm

4条回答

误落风尘 (楼主)

2020-12-18 07:24
If you are required to have a person mark these records manually with a text editor, for whatever reason, you should probably use split to split the file up into manageable pieces.
```
split -a4 -d -l100000 hugefile.txt part.
```
This will split the file up into pieces with 100000 lines each. The names of the files will be part.0000, part.0001, etc. Then, after all the files have been edited, you can combine them back together with cat:
```
cat part.* > new_hugefile.txt
```
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...