问题
I have tested running of nutch in server mode by starting it using bin/nutch startserver command locally. Now I wonder whether I can start nutch in server mode on top of a hadoop cluster(in distributed environment) and submit crawl requests to server using nutch REST api ? Please help.
回答1:
From further research I've got nutch server working on distributed mode.
Steps :-
- Assume hadoop is configured in all slave nodes. Then setup nutch in all nodes. This can help : http://wiki.apache.org/nutch/NutchHadoopTutorial
- On your namenode,
cd $NUTCH_HOME/runtime/deploy
bin/nutch startserver -port <port> -host <host>
Note :Port and host are optional.- Then you can submit requests from nutch using REST. The requests you submit will be accepted by nutch server started on step 3.
Happy crawling :)
来源:https://stackoverflow.com/questions/39761712/how-to-run-nutch-server-on-distributed-environment