发表新帖

发表新帖

`uniq` without sorting an immense text file?

后端未结

关注

 6  2049

我在风中等你 2020-12-18 07:01

I have a stupidly large text file (i.e. 40 gigabytes as of today) that I would like to filter for unique lines without sorting the file.

The file ha

6条回答

眼角桃花 (楼主)

2020-12-18 07:17

If there's a lot of duplication, one possibility is to split the file using split(1) into manageable pieces and using something conventional like sort/uniq to make a summary of unique lines. This will be shorter than the actual piece itself. After this, you can compare the pieces to arrive at an actual summary.

0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...

热议问题