Find word with maximum number of occurrences

后端 未结 2 1349
轮回少年
轮回少年 2021-01-07 02:49

What is the most optimal way (algorithm) to search for the word that has the maximum number of occurrences in a document?

相关标签:
2条回答
  • 2021-01-07 03:44
    1. Scan the document once, keeping a count of how many times you have seen every unique word (perhaps using a hashtable or a tree to do this).
    2. While performing step 1, keep track of the word that has the highest count of all words seen so far.
    0 讨论(0)
  • 2021-01-07 03:45

    Finding the word that occures most times in a document can be done in O(n) by a simple histogram [hash based]:

    histogram <- new map<String,int>
    for each word in document: 
       if word in histogram:
          histogram[word] <- histogram[word] + 1
       else:
          histogram[word] <- 1
    max <- 0
    maxWord<- ""
    for each word in histogram:
      if histogram[word] > max:
         max <- histogram[word]
         maxWord <- word
    return maxWord
    

    This is O(n) solution, and since the problem is clearly Omega(n) problem, it is optimal in terms of big O notation.

    0 讨论(0)
提交回复
热议问题