问题
I'm trying to deploy nutch 2.1 on Ubuntu 12.04 by following that tutorial. Everything goes well until I try to inject urls into the database. When I type ($bin/nutch inject urls) and press Enter I get
InjectorJob: starting
InjectorJob: urlDir: urls
and remains there (for hours) until I decide to cancel the execution. urls is a directory that contains file with urls. I added proxy and port details in the nutch-site.xml as suggested here but it doesn't solve. I tried apache nutch 2.2.1 and the issue continues.
If you know how to fix that issue, please, help me!
Thanks in advance.
回答1:
Ubuntu defaults the loopback IP address in hosts to 127.0.1.1. HBase (according to this page) requires your loopback IP address be 127.0.0.1.
The Ubuntu /etc/hosts
file by default contains (with myComputerName being your computer name):
127.0.0.1 localhost
127.0.1.1 myComputerName
Use sudo gedit /etc/hosts
to update your hosts file as follow:
127.0.0.1 localhost
127.0.0.1 myComputerName
Reboot Ubuntu. Nutch should no longer have trouble injecting urls into HBase.
来源:https://stackoverflow.com/questions/23050000/nutch-2-1-urls-injection-takes-forever