Given a file, find the ten most frequently occurring words as efficiently as possible

后端 未结 15 1685
予麋鹿
予麋鹿 2020-12-12 13:26

This is apparently an interview question (found it in a collection of interview questions), but even if it\'s not it\'s pretty cool.

We are told to do this efficien

相关标签:
15条回答
  • 2020-12-12 14:14

    You could make a time/space tradeoff and go O(n^2) for time and O(1) for (memory) space by counting how many times a word occurs each time you encounter it in a linear pass of the data. If the count is above the top 10 found so far, then keep the word and the count, otherwise ignore it.

    0 讨论(0)
  • 2020-12-12 14:14

    Cycle through the string of words and store each in a dictionary(using python) and number of times they occur as the value.

    0 讨论(0)
  • 2020-12-12 14:17

    If the word list will not fit in memory, you can split the file until it will. Generate a histogram of each part (either sequentially or in parallel), and merge the results (the details of which may be a bit fiddly if you want guaranteed correctness for all inputs, but should not compromise the O(n) effort, or the O(n/k) time for k tasks).

    0 讨论(0)
提交回复
热议问题