Parsing one terabyte of text and efficiently counting the number of occurrences of each word

后端 未结 16 578
野趣味
野趣味 2020-11-30 17:21

Recently I came across an interview question to create a algorithm in any language which should do the following

  1. Read 1 terabyte of content
  2. Make a co
16条回答
  •  时光说笑
    2020-11-30 17:48

    Three things of note for this.

    Specifically: File to large to hold in memory, word list (potentially) too large to hold in memory, word count can be too large for a 32 bit int.

    Once you get through those caveats, it should be straight forward. The game is managing the potentially large word list.

    If it's any easier (to keep your head from spinning).

    "You're running a Z-80 8 bit machine, with 65K of RAM and have a 1MB file..."

    Same exact problem.

提交回复
热议问题