I am interested to do web crawling. I was looking at solr
.
Does solr
do web crawling, or what are the steps to do web crawling?
Solr does not in of itself have a web crawling feature.
Nutch is the "de-facto" crawler (and then some) for Solr.
Yes, I agree with the other posts here, use Apache Nutch
bin/nutch crawl urls -solr http://localhost:8983/solr/ -depth 3 -topN 5
Although your solr version has the match the correct version of Nutch, because older versions of solr stores the indices in a different format
Its tutorial: http://wiki.apache.org/nutch/NutchTutorial