Unknown issue in Nutch elastic indexer with nutch REST api

删除回忆录丶 提交于 2019-12-23 17:21:49

问题


I was trying to expose nutch using REST endpoints and ran into an issue in indexer phase. I'm using elasticsearch index writer to index docs to ES. I've used $NUTCH_HOME/runtime/deploy/bin/nutch startserver command. While indexing an unknown exception is thrown.

Error: com.google.common.util.concurrent.MoreExecutors.directExecutor()Ljava/util/concurrent/Executor; 16/10/07 16:01:47 INFO mapreduce.Job: map 100% reduce 0% 16/10/07 16:01:49 INFO mapreduce.Job: Task Id : attempt_1475748314769_0107_r_000000_1, Status : FAILED Error: com.google.common.util.concurrent.MoreExecutors.directExecutor()Ljava/util/concurrent/Executor; 16/10/07 16:01:53 INFO mapreduce.Job: Task Id : attempt_1475748314769_0107_r_000000_2, Status : FAILED Error: com.google.common.util.concurrent.MoreExecutors.directExecutor()Ljava/util/concurrent/Executor; 16/10/07 16:01:58 INFO mapreduce.Job: map 100% reduce 100% 16/10/07 16:01:59 INFO mapreduce.Job: Job job_1475748314769_0107 failed with state FAILED due to: Task failed task_1475748314769_0107_r_000000 Job failed as tasks failed. failedMaps:0 failedReduces:1

ERROR indexer.IndexingJob: Indexer: java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:865) at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145) at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:228) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:237)

Failed with exit code 255.

Any help would be appreciated.

PS : After debugging using stack trace I think the issue is due to mismatch in guava version. I've tried changing build.xml of plugins(parse-tika and parsefilter-naivebayes) but it didn't work.


回答1:


I have found solution for this issue. This is due to the version compatibility of guava dependency. Hadoop uses guava-11.0.2.jar as dependency. But the elastic indexer plugin in nutch requires 18.0 version of guava. That's why it is throwing an exception when trying to run in distributed hadoop. So we just need to update guava version to 18.0 in hadoop libs(can be found at $HADOOP_HOME/share/hadoop/common/libs/).



来源:https://stackoverflow.com/questions/39916877/unknown-issue-in-nutch-elastic-indexer-with-nutch-rest-api

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!