Elasticsearch connector works in IDE but not on local cluster

问题

I am trying to write a Twitter stream into an Elasticsearch 2.3 index using the provided Elasticsearch2 connector

Running my job in IntelliJ works fine but when I run that jar job on a local cluster I get the following error:

05/09/2016 13:26:58 Job execution switched to status RUNNING.
05/09/2016 13:26:58 Source: Custom Source -> (Sink: Unnamed, Sink: Unnamed, Sink: Unnamed)(1/1) switched to SCHEDULED 
05/09/2016 13:26:58 Source: Custom Source -> (Sink: Unnamed, Sink: Unnamed, Sink: Unnamed)(1/1) switched to DEPLOYING 
05/09/2016 13:26:58 Source: Custom Source -> (Sink: Unnamed, Sink: Unnamed, Sink: Unnamed)(1/1) switched to RUNNING 
05/09/2016 13:26:59 Source: Custom Source -> (Sink: Unnamed, Sink: Unnamed, Sink: Unnamed)(1/1) switched to FAILED 
java.lang.RuntimeException: Client is not connected to any Elasticsearch nodes!
    at org.apache.flink.streaming.connectors.elasticsearch2.ElasticsearchSink.open(ElasticsearchSink.java:172)
    at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:38)
    at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:91)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:317)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:215)
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:579)
    at java.lang.Thread.run(Thread.java:745)

05/09/2016 13:26:59 Job execution switched to status FAILING.
java.lang.RuntimeException: Client is not connected to any Elasticsearch nodes!
    at org.apache.flink.streaming.connectors.elasticsearch2.ElasticsearchSink.open(ElasticsearchSink.java:172)
    at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:38)
    at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:91)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:317)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:215)
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:579)
    at java.lang.Thread.run(Thread.java:745)
05/09/2016 13:26:59 Job execution switched to status FAILED.

------------------------------------------------------------
 The program finished with the following exception:

org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Job execution failed.
    at org.apache.flink.client.program.Client.runBlocking(Client.java:381)
    at org.apache.flink.client.program.Client.runBlocking(Client.java:355)
    at org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:65)
    at org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:541)
    at com.pl.greeny.flink.TwitterAnalysis$.main(TwitterAnalysis.scala:69)
    at com.pl.greeny.flink.TwitterAnalysis.main(TwitterAnalysis.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:505)
    at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:403)
    at org.apache.flink.client.program.Client.runBlocking(Client.java:248)
    at org.apache.flink.client.CliFrontend.executeProgramBlocking(CliFrontend.java:860)
    at org.apache.flink.client.CliFrontend.run(CliFrontend.java:327)
    at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1187)
    at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1238)
Caused by: org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
    at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply$mcV$sp(JobManager.scala:807)
    at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:753)
    at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:753)
    at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
    at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
    at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.RuntimeException: Client is not connected to any Elasticsearch nodes!
    at org.apache.flink.streaming.connectors.elasticsearch2.ElasticsearchSink.open(ElasticsearchSink.java:172)
    at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:38)
    at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:91)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:317)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:215)
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:579)
    at java.lang.Thread.run(Thread.java:745)

My code in scala:

val config = new java.util.HashMap[String, String]
      config.put("bulk.flush.max.actions", "1")
      config.put("cluster.name", "elasticsearch")
      config.put("node.name", "node-1")

      config.put("path.home", "/media/user/e5e05ab5-28f3-4cee-a57c-444e32b99f04/thesis/elasticsearch-2.3.2/bin")
      val transports = new util.ArrayList[InetSocketAddress]
      transports.add(new InetSocketAddress(InetAddress.getLocalHost(),9300))
    transports.add(new InetSocketAddress(InetAddress.getLoopbackAddress(),9300))
    transports.add(new InetSocketAddress(InetAddress.getByName("127.0.0.1"),9300))
    transports.add(new InetSocketAddress(InetAddress.getByName("localhost"),9300))
    stream.addSink(new ElasticsearchSink(config, transports, new ElasticSearchSinkTwitter()))

What is the difference between running that program from an IDE and the local cluster?

回答1:

Problems like this are often caused by the different ways that dependencies are managed / included by IDEs (IntelliJ, Eclipse) and Flink's job submission via fat jars.

I had the same problem the other day and the task manager log file revealed the following root cause:

java.lang.IllegalArgumentException: An SPI class of type org.apache.lucene.codecs.PostingsFormat with name 'Lucene50' does not exist. You need to add the corresponding JAR file supporting this SPI to your classpath. The current classpath supports the following names: [es090, completion090, XBloomFilter]

Searching for the error I found this answer on SO that solved the issue:

https://stackoverflow.com/a/38354027/3609571

by adding the following dependency to my pom.xml:

<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-core</artifactId>
    <version>5.4.1</version>
 </dependency>

Note, the order of dependencies matters in this case. It only worked when putting the lucene-core dependency on top. Adding it to the end did not work for me. So this more a "hack" than a proper fix.

来源：https://stackoverflow.com/questions/37114886/elasticsearch-connector-works-in-ide-but-not-on-local-cluster

标签

scala

ElasticSearch

intellij-idea

apache-flink