Nutch 2.1 urls injection takes forever

半城伤御伤魂 提交于 2019-12-23 15:01:04

问题


I'm trying to deploy nutch 2.1 on Ubuntu 12.04 by following that tutorial. Everything goes well until I try to inject urls into the database. When I type ($bin/nutch inject urls) and press Enter I get

    InjectorJob: starting
    InjectorJob: urlDir: urls

and remains there (for hours) until I decide to cancel the execution. urls is a directory that contains file with urls. I added proxy and port details in the nutch-site.xml as suggested here but it doesn't solve. I tried apache nutch 2.2.1 and the issue continues.

If you know how to fix that issue, please, help me!

Thanks in advance.


回答1:


Ubuntu defaults the loopback IP address in hosts to 127.0.1.1. HBase (according to this page) requires your loopback IP address be 127.0.0.1.

The Ubuntu /etc/hosts file by default contains (with myComputerName being your computer name):

127.0.0.1   localhost
127.0.1.1   myComputerName

Use sudo gedit /etc/hosts to update your hosts file as follow:

127.0.0.1   localhost
127.0.0.1   myComputerName

Reboot Ubuntu. Nutch should no longer have trouble injecting urls into HBase.



来源:https://stackoverflow.com/questions/23050000/nutch-2-1-urls-injection-takes-forever

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!