Parsing one terabyte of text and efficiently counting the number of occurrences of each word

后端未结

关注

 16  545

野趣味 2020-11-30 17:21

Recently I came across an interview question to create a algorithm in any language which should do the following

Read 1 terabyte of content
Make a co

16条回答

[愿得一人] (楼主)

2020-11-30 18:09

As a quick general algorithm I would do this.

Create a map with entries being the count for a specific word and the key being the actual string.  

for each string in content:
   if string is a valid key for the map:
      increment the value associated with that key
   else
      add a new key/value pair to the map with the key being the word and the count being one
done

Then you could just find the largest value in the map


create an array size 10 with data pairs of (word, count) 

for each value in the map
    if current pair has a count larger than the smallest count in the array
        replace that pair with the current one

print all pairs in array

0 讨论(0)

查看其它16个回答