问题
I am using Apache Nutch 1.14 on Windows 10 having java 1.8. I have followed the same steps as mentioned on https://wiki.apache.org/nutch/NutchTutorial.
When I try to inject the URLs in crawldb using the command on cygwin : bin/nutch inject crawl/crawldb urls
I get the following error: Injector: java.io.IOException: (null) entry in command string: null chmod 0644 E:\apache-nutch-1.4\runtime\local\crawl\crawldb.locked at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)
I checked the logs and found this:
2018-01-18 10:55:26,785 ERROR util.Shell - Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
I have searched for this error on several pages but none was of help.
回答1:
- make new directory in windows e.g c:\winutil.
- inside winutil create bin directory
- open https://minhaskamal.github.io/DownGit/#/home
- paste https://github.com/steveloughran/winutils/tree/master/hadoop-2.8.1 in above website, and download the winutil-hadoop2.8.1
- extract the zip content in c:\winutil\bin
- add HADOOP_HOME variable to your system variable and make it refer to c:\winutil
- re-run your crawl command in cygin
来源:https://stackoverflow.com/questions/48314451/apache-nutch-error-injector-java-io-ioexception-null-entry-in-command-strin