Recently I came across an interview question to create a algorithm in any language which should do the following
Storm is the technogy to look at. It separates the role of data input (spouts ) from processors (bolts). The chapter 2 of the storm book solves your exact problem and describes the system architecture very well - http://www.amazon.com/Getting-Started-Storm-Jonathan-Leibiusky/dp/1449324010
Storm is more real time processing as opposed to batch processing with Hadoop. If your data is per existing then you can distribute loads to different spouts and spread them for processing to different bolts .
This algorithm also will enable support for data growing beyond terabytes as the date will be analysed as it arrives in real time.