Streaming data and Hadoop? (not Hadoop Streaming)

前端未结

关注

 10  1573

别跟我提以往 2021-01-30 11:55

I\'d like to analyze a continuous stream of data (accessed over HTTP) using a MapReduce approach, so I\'ve been looking into Apache Hadoop. Unfortunately, it appears that Hadoop

10条回答

没有蜡笔的小新 (楼主)

2021-01-30 12:35

Your use case sounds similar to the issue of writing a web crawler using Hadoop - the data streams back (slowly) from sockets opened to fetch remote pages via HTTP.

If so, then see Why fetching web pages doesn't map well to map-reduce. And you might want to check out the FetcherBuffer class in Bixo, which implements a threaded approach in a reducer (via Cascading) to solve this type of problem.

0 讨论(0)

查看其它10个回答
发布评论:

提交评论
- 加载中...