I have a stupidly large text file (i.e. 40 gigabytes as of today) that I would like to filter for unique lines without sorting the file.
The file ha
If there's a lot of duplication, one possibility is to split the file using split(1) into manageable pieces and using something conventional like sort/uniq to make a summary of unique lines. This will be shorter than the actual piece itself. After this, you can compare the pieces to arrive at an actual summary.