Ignore Spark Cluster Own Jars

强颜欢笑 提交于 2021-02-10 09:22:14

问题


I would like to use my own application Spark jars. More in concrete I have one jar of mllib that is not already released that contains a fixed bug of BisectingKMeans. So, my idea is to use it in my spark cluster (in locally it works perfectly).

I've tried many things: extraclasspath, userClassPathFirst, jars option...many options that do not work. My last idea is to use the Shade rule of sbt to change all org.apache.spark.* packages to shadespark.* but when I deploy it is still using the cluster' spark jars.

Any idea?


回答1:


You can try to use the Maven shade plugin to relocate the conflicting packages. This creates a separate namespace for the newer version of the mllib jar. So both the old and the new version will be on the classpath, but since the new version has an alternative name you can refer to the newer package explicitly.

Have a look at https://maven.apache.org/plugins/maven-shade-plugin/examples/class-relocation.html:

If the uber JAR is reused as a dependency of some other project, directly including classes from the artifact's dependencies in the uber JAR can cause class loading conflicts due to duplicate classes on the class path. To address this issue, one can relocate the classes which get included in the shaded artifact in order to create a private copy of their bytecode:

I got this idea from the video "Top 5 Mistakes When Writing Spark Applications": https://youtu.be/WyfHUNnMutg?t=23m1s



来源:https://stackoverflow.com/questions/42365013/ignore-spark-cluster-own-jars

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!