run nutch2.3.1 on hadoop2

雨燕双飞 提交于 2019-12-11 05:03:29

问题


I want to run nutch2.3.1 to crawl data on hadoop2. I have 3 nodes for hadoop2:

  • crawler1:master
  • crawler2:slave
  • crawler3:slave

I deployed nutch2.3.1 to crawler1 and run it with following command: /usr/local/nutch/deploy/bin/crawl hdfs://xxx.xxx.xxx.xxx/urls/seed.txt test 5

It works and can crawl data ,but it looks like the crawl job only run on crawler1, the others nodes did not do any job for nutch.

my questions are:

  1. do I need deploy nutch to crawler2 and crawler3?
  2. do I need run crawl command on 3 nodes?
  3. if my steps are wrong ,what are the right steps?

Sorry for my poor English, I really appreciate any help you can provide.

来源:https://stackoverflow.com/questions/39485798/run-nutch2-3-1-on-hadoop2

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!