Quickly find differences between two large text files

后端 未结 5 1687
野性不改
野性不改 2021-01-02 06:41

I have two 3GB text files, each file has around 80 million lines. And they share 99.9% identical lines (file A has 60,000 unique lines, file B has 80,000 unique lines).

5条回答
  •  误落风尘
    2021-01-02 07:28

    If I understand correctly, you want the lines of these files without duplicates. This does the job:

    uniqA = set(open('fileA', 'r'))
    

提交回复
热议问题