Search Large Text File for Thousands of strings

前端未结

关注

 3  509

无人共我 2021-01-15 03:31

I have a large text file that is 20 GB in size. The file contains lines of text that are relatively short (40 to 60 characters per line). The file is unsorted.

I hav

3条回答

醉话见心 (楼主)

2021-01-15 04:15

The problem you describe looks more like a problem with the selected algorithm, not with the technology of choice. 20000 full scans of 20GB in 4 days doesn't sound too unreasonable, but your target should be a single scan of the 20GB and another single scan of the 20K words.

Have you considered looking at some string matching algorithms? Aho–Corasick comes to mind.

0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...