Given a file, find the ten most frequently occurring words as efficiently as possible

后端 未结 15 1697
予麋鹿
予麋鹿 2020-12-12 13:26

This is apparently an interview question (found it in a collection of interview questions), but even if it\'s not it\'s pretty cool.

We are told to do this efficien

15条回答
  •  情深已故
    2020-12-12 14:17

    If the word list will not fit in memory, you can split the file until it will. Generate a histogram of each part (either sequentially or in parallel), and merge the results (the details of which may be a bit fiddly if you want guaranteed correctness for all inputs, but should not compromise the O(n) effort, or the O(n/k) time for k tasks).

提交回复
热议问题