发表新帖

发表新帖

Given a file, find the ten most frequently occurring words as efficiently as possible

后端未结

关注

 15  1703

予麋鹿 2020-12-12 13:26

This is apparently an interview question (found it in a collection of interview questions), but even if it\'s not it\'s pretty cool.

We are told to do this efficien

15条回答

无人及你 (楼主)

2020-12-12 14:11
An complete solution would be something like this:
1. Do an external sort O(N log N)
2. Count the word freq in the file O(N)
3. (An alternate would be the use of a Trie as @Summer_More_More_Tea to count the frequencies, if you can afford that amount of memory) O(k*N) //for the two first steps
4. Use a min-heap:
  - Put the first n elements on the heap
  - For every word left add it to the heap and delete the new min in heap
  - In the end the heap Will contain the n-th most common words O(|words|*log(n))
With the Trie the cost would be O(k*N), because the number of total words generally is bigger than the size of the vocabulary. Finally, since k is smaller for most of the western languages you could assume a linear complexity.
0 讨论(0)

查看其它15个回答
发布评论:

提交评论
- 加载中...

热议问题