An interesting question asked of me when I attended one interview regarding web mining. The question was, is it possible to crawl the Websites using Apache Spark?
I
There is a project, called SpookyStuff, which is an
Scalable query engine for web scrapping/data mashup/acceptance QA, powered by Apache Spark
Hope it helps!