Search Large Text File for Thousands of strings

前端 未结 3 509
无人共我
无人共我 2021-01-15 03:31

I have a large text file that is 20 GB in size. The file contains lines of text that are relatively short (40 to 60 characters per line). The file is unsorted.

I hav

3条回答
  •  醉话见心
    2021-01-15 04:15

    The problem you describe looks more like a problem with the selected algorithm, not with the technology of choice. 20000 full scans of 20GB in 4 days doesn't sound too unreasonable, but your target should be a single scan of the 20GB and another single scan of the 20K words.

    Have you considered looking at some string matching algorithms? Aho–Corasick comes to mind.

提交回复
热议问题