Does solr do web crawling?

后端 未结 8 1507
Happy的楠姐
Happy的楠姐 2020-12-08 08:09

I am interested to do web crawling. I was looking at solr.

Does solr do web crawling, or what are the steps to do web crawling?

8条回答
  •  轻奢々
    轻奢々 (楼主)
    2020-12-08 08:45

    Solr 5+ DOES in fact now do web crawling! http://lucene.apache.org/solr/

    Older Solr versions do not do web crawling alone, as historically it's a search server that provides full text search capabilities. It builds on top of Lucene.

    If you need to crawl web pages using another Solr project then you have a number of options including:

    • Nutch - http://lucene.apache.org/nutch/
    • Websphinx - http://www.cs.cmu.edu/~rcm/websphinx/
    • JSpider - http://j-spider.sourceforge.net/
    • Heritrix - http://crawler.archive.org/

    If you want to make use of the search facilities provided by Lucene or SOLR you'll need to build indexes from the web crawl results.

    See this also:

    Lucene crawler (it needs to build lucene index)

提交回复
热议问题