Distributed Web crawling using Apache Spark - Is it Possible?

后端未结

关注

 5  2054

忘掉有多难 2020-12-24 15:31

An interesting question asked of me when I attended one interview regarding web mining. The question was, is it possible to crawl the Websites using Apache Spark?

5条回答

没有蜡笔的小新 (楼主)

2020-12-24 15:55

Spark adds essentially no value to this task.

Sure, you can do distributed crawling, but good crawling tools already support this out of the box. The datastructures provided by Spark such as RRDs are pretty much useless here, and just to launch crawl jobs, you could just use YARN, Mesos etc. directly at less overhead.

Sure, you could do this on Spark. Just like you could do a word processor on Spark, since it is turing complete... but it doesn't get any easier.

0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...