问题
I'm following this tutorial for setting up nutch alongwith Elasticsearch. Whenever I try to index the data into the ES, it returns an error. Following are the logs:-
Command:-
bin/nutch index elasticsearch -all
Logs when I add elastic.port
(9200) in conf/nutch-site.xml
:-
2016-05-05 13:22:49,903 INFO basic.BasicIndexingFilter - Maximum title length for indexing set to: 100
2016-05-05 13:22:49,904 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2016-05-05 13:22:49,904 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off
2016-05-05 13:22:49,904 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2016-05-05 13:22:49,905 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.metadata.MetadataIndexer
2016-05-05 13:22:49,906 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.more.MoreIndexingFilter
2016-05-05 13:22:49,961 INFO elastic.ElasticIndexWriter - Processing remaining requests [docs = 0, length = 0, total docs = 0]
2016-05-05 13:22:49,961 INFO elastic.ElasticIndexWriter - Processing to finalize last execute
2016-05-05 13:22:54,898 INFO client.transport - [Peggy Carter] failed to get node info for [#transport#-1][ubuntu][inet[localhost/127.0.0.1:9200]], disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException: [][inet[localhost/127.0.0.1:9200]][cluster:monitor/nodes/info] request_id [1] timed out after [5000ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:366)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2016-05-05 13:22:55,682 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.elastic.ElasticIndexWriter
2016-05-05 13:22:55,683 INFO indexer.IndexingJob - Active IndexWriters :
ElasticIndexWriter
elastic.cluster : elastic prefix cluster
elastic.host : hostname
elastic.port : port (default 9300)
elastic.index : elastic index command
elastic.max.bulk.docs : elastic bulk index doc counts. (default 250)
elastic.max.bulk.size : elastic bulk index length. (default 2500500 ~2.5MB)
2016-05-05 13:22:55,711 INFO elasticsearch.plugins - [Adrian Toomes] loaded [], sites []
2016-05-05 13:23:00,763 INFO client.transport - [Adrian Toomes] failed to get node info for [#transport#-1][ubuntu][inet[localhost/127.0.0.1:92$0]], disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException: [][inet[localhost/127.0.0.1:9200]][cluster:monitor/nodes/info] request_id [0] time$ out after [5000ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:366)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2016-05-05 13:23:00,766 INFO indexer.IndexingJob - IndexingJob: done.
Logs when default port 9300 is used:-
2016-05-05 13:58:44,584 INFO elasticsearch.plugins - [Mentallo] loaded [], sites []
2016-05-05 13:58:44,673 WARN transport.netty - [Mentallo] Message not fully read (response) for [0] handler future(org.elasticsearch.client.transport.TransportClientNodesService$SimpleNodeSampler$1@3c80f1dd), error [true], resetting
2016-05-05 13:58:44,674 INFO client.transport - [Mentallo] failed to get node info for [#transport#-1][ubuntu][inet[localhost/127.0.0.1:9300]], disconnecting...
org.elasticsearch.transport.RemoteTransportException: Failed to deserialize exception response from stream
Caused by: org.elasticsearch.transport.TransportSerializationException: Failed to deserialize exception response from stream
at org.elasticsearch.transport.netty.MessageChannelHandler.handlerResponseError(MessageChannelHandler.java:173)
at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:125)
at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.StreamCorruptedException: Unsupported version: 1
at org.elasticsearch.common.io.ThrowableObjectInputStream.readStreamHeader(ThrowableObjectInputStream.java:46)
at java.io.ObjectInputStream.<init>(ObjectInputStream.java:301)
at org.elasticsearch.common.io.ThrowableObjectInputStream.<init>(ThrowableObjectInputStream.java:38)
at org.elasticsearch.transport.netty.MessageChannelHandler.handlerResponseError(MessageChannelHandler.java:170)
... 23 more
2016-05-05 13:58:44,676 INFO indexer.IndexingJob - IndexingJob: done.
I've configured everything fine. Have had a look at various threads as well but to no avail. Also java version for both ES and JVM is same. Is there a bug in here?
I'm using Nutch 2.3.1 and have tried with both ES 1.4.4 and 2.3.2. I can see data in Mongo but I cannot index data in ES. Why??
来源:https://stackoverflow.com/questions/37052674/nutch-elasticsearch-integration