Apache Spark — using spark-submit throws a NoSuchMethodError

浪子不回头ぞ 提交于 2020-02-20 04:47:20

问题


To submit a Spark application to a cluster, their documentation notes:

To do this, create an assembly jar (or “uber” jar) containing your code and its dependencies. Both sbt and Maven have assembly plugins. When creating assembly jars, list Spark and Hadoop as provided dependencies; these need not be bundled since they are provided by the cluster manager at runtime. -- http://spark.apache.org/docs/latest/submitting-applications.html

So, I added the Apache Maven Shade Plugin to my pom.xml file. (version 3.0.0)
And I turned my Spark dependency's scope into provided. (version 2.1.0)

(I also added the Apache Maven Assembly Plugin to ensure I was wrapping all of my dependencies in the jar when I run mvn clean package. I'm unsure if it's truly necessary.)


Thus is how spark-submit fails. It throws a NoSuchMethodError for a dependency I have (note that the code works from a local instance when compiling inside IntelliJ, assuming that provided is removed).

Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Stopwatch.createStarted()Lcom/google/common/base/Stopwatch;

The line of code that throws the error is irrelevant--it's simply the first line in my main method that creates a Stopwatch, part of the Google Guava utilities. (version 21.0)

Other solutions online suggest that it has to do with version conflicts of Guava, but I haven't had any luck yet with those suggestions. Any help would be appreciated, thank you.


回答1:


If you take a look at the /jars subdirectory of the Spark 2.1.0 installation, you will likely see guava-14.0.1.jar. Per the API for the Guava Stopwatch#createStarted method you are using, createStarted did not exist until Guava 15.0. What is most likely happening is that the Spark process Classloader is finding the Spark-provided Guava 14.0.1 library before it finds the Guava 21.0 library packaged in your uberjar.

One possible resolution is to use the class-relocation feature provided by the Maven Shade plugin (which you're already using to construct your uberjar). Via "class relocation", Maven-Shade moves the Guava 21.0 classes (needed by your code) during the packaging of the uberjar from a pattern location reflecting their existing package name (e.g. com.google.common.base) to an arbitrary shadedPattern location, which you specify in the Shade configuration (e.g. myguava123.com.google.common.base).

The result is that the older and newer Guava libraries no longer share a package name, avoiding the runtime conflict.




回答2:


Most likely you're having a dependency conflict, yes.

First you can look if you have a dependency conflict when you build your jar. A quick way is to look in your jar directly to see if the Stopwatch.class file is there, and if, by looking at the bytecode, it appears that the method createStarted is there. Otherwise you can also list the dependency tree and work from there : https://maven.apache.org/plugins/maven-dependency-plugin/examples/resolving-conflicts-using-the-dependency-tree.html

If it's not an issue with your jar, you might have a dependency issue due to a conflict between your spark installation and your jar. Look in the lib and jars folder of your spark installation. There you can see if you have jars that include an alternate version of guava that wouldnt support the method createStarted() from Stopwatch




回答3:


Apply above answers to solve the problem by following config:

  <plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-shade-plugin</artifactId>
    <version>3.1.0</version>
    <executions>
      <execution>
        <phase>package</phase>
        <goals>
          <goal>shade</goal>
        </goals>
        <configuration>
            <relocations>
                <relocation>
                    <pattern>com.google.common</pattern>
                    <shadedPattern>shade.com.google.common</shadedPattern>
                </relocation>
                <relocation>
                    <pattern>com.google.thirdparty.publicsuffix</pattern>
                    <shadedPattern>shade.com.google.thirdparty.publicsuffix</shadedPattern>
                </relocation>
          </relocations>
        </configuration>
      </execution>
    </executions>
  </plugin>


来源:https://stackoverflow.com/questions/42398720/apache-spark-using-spark-submit-throws-a-nosuchmethoderror

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!