Streaming data and Hadoop? (not Hadoop Streaming)

前端 未结 10 1475
别跟我提以往
别跟我提以往 2021-01-30 11:55

I\'d like to analyze a continuous stream of data (accessed over HTTP) using a MapReduce approach, so I\'ve been looking into Apache Hadoop. Unfortunately, it appears that Hadoop

10条回答
  •  没有蜡笔的小新
    2021-01-30 12:35

    Your use case sounds similar to the issue of writing a web crawler using Hadoop - the data streams back (slowly) from sockets opened to fetch remote pages via HTTP.

    If so, then see Why fetching web pages doesn't map well to map-reduce. And you might want to check out the FetcherBuffer class in Bixo, which implements a threaded approach in a reducer (via Cascading) to solve this type of problem.

提交回复
热议问题