Distributed Web crawling using Apache Spark - Is it Possible?

后端 未结 5 2042
忘掉有多难
忘掉有多难 2020-12-24 15:31

An interesting question asked of me when I attended one interview regarding web mining. The question was, is it possible to crawl the Websites using Apache Spark?

I

5条回答
  •  南方客
    南方客 (楼主)
    2020-12-24 15:38

    There is a project, called SpookyStuff, which is an

    Scalable query engine for web scrapping/data mashup/acceptance QA, powered by Apache Spark

    Hope it helps!

提交回复
热议问题