发表新帖

发表新帖

Parsing one terabyte of text and efficiently counting the number of occurrences of each word

后端未结

关注

 16  576

野趣味 2020-11-30 17:21

Recently I came across an interview question to create a algorithm in any language which should do the following

Read 1 terabyte of content
Make a co

16条回答

生来不讨喜 (楼主)

2020-11-30 17:57

Storm is the technogy to look at. It separates the role of data input (spouts ) from processors (bolts). The chapter 2 of the storm book solves your exact problem and describes the system architecture very well - http://www.amazon.com/Getting-Started-Storm-Jonathan-Leibiusky/dp/1449324010

Storm is more real time processing as opposed to batch processing with Hadoop. If your data is per existing then you can distribute loads to different spouts and spread them for processing to different bolts .

This algorithm also will enable support for data growing beyond terabytes as the date will be analysed as it arrives in real time.

0 讨论(0)

查看其它16个回答
发布评论:

提交评论
- 加载中...

热议问题