Parsing one terabyte of text and efficiently counting the number of occurrences of each word

后端 未结 16 587
野趣味
野趣味 2020-11-30 17:21

Recently I came across an interview question to create a algorithm in any language which should do the following

  1. Read 1 terabyte of content
  2. Make a co
16条回答
  •  感情败类
    2020-11-30 18:11

    Well, personally, I'd split the file into different sizes of say 128mb, maintaining two in memory all the time while scannng, any discovered word is added to a Hash list, and List of List count, then I'd iterate the list of list at the end to find the top 10...

提交回复
热议问题