发表新帖

发表新帖

Given a file, find the ten most frequently occurring words as efficiently as possible

后端未结

关注

 15  1693

This is apparently an interview question (found it in a collection of interview questions), but even if it\'s not it\'s pretty cool.

We are told to do this efficien

相关标签:

15条回答

感情败类

2020-12-12 14:14

You could make a time/space tradeoff and go O(n^2) for time and O(1) for (memory) space by counting how many times a word occurs each time you encounter it in a linear pass of the data. If the count is above the top 10 found so far, then keep the word and the count, otherwise ignore it.

0 讨论(0)
发布评论:

提交评论
- 加载中...
别跟我提以往

2020-12-12 14:14

Cycle through the string of words and store each in a dictionary(using python) and number of times they occur as the value.

0 讨论(0)
发布评论:

提交评论
- 加载中...
情深已故

2020-12-12 14:17

If the word list will not fit in memory, you can split the file until it will. Generate a histogram of each part (either sequentially or in parallel), and merge the results (the details of which may be a bit fiddly if you want guaranteed correctness for all inputs, but should not compromise the O(n) effort, or the O(n/k) time for k tasks).

0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2 3

热议问题