发表新帖

发表新帖

Parsing one terabyte of text and efficiently counting the number of occurrences of each word

后端未结

关注

 16  583

野趣味 2020-11-30 17:21

Recently I came across an interview question to create a algorithm in any language which should do the following

Read 1 terabyte of content
Make a co

16条回答

庸人自扰 (楼主)

2020-11-30 17:54

Try to think of special data structure to approach this kind of problems. In this case special kind of tree like trie to store strings in specific way, very efficient. Or second way to build your own solution like counting words. I guess this TB of data would be in English then we do have around 600,000 words in general so it'll be possible to store only those words and counting which strings would be repeated + this solution will need regex to eliminate some special characters. First solution will be faster, I'm pretty sure.

http://en.wikipedia.org/wiki/Trie

here is implementation of tire in java
http://algs4.cs.princeton.edu/52trie/TrieST.java.html

0 讨论(0)

查看其它16个回答
发布评论:

提交评论
- 加载中...

热议问题