How can I use proto3 with Hadoop/Spark?

梦想的初衷 提交于 2019-12-30 03:23:06

问题


I've got several .proto files which rely on syntax = "proto3";. I also have a Maven project that is used to build Hadoop/Spark jobs (Hadoop 2.7.1 and Spark 1.5.2). I'd like to generate data in Hadoop/Spark and then serialize it according to my proto3 files.

Using libprotoc 3.0.0, I generate Java sources which work fine within my Maven project as long as I have the following in my pom.xml:

<dependency>
  <groupId>com.google.protobuf</groupId>
  <artifactId>protobuf-java</artifactId>
  <version>3.0.0-beta-1</version>
</dependency>  

Now, when I use my libprotoc-generated classes in a job that gets deployed to a cluster I get hit with:

java.lang.VerifyError : class blah overrides final method mergeUnknownFields.(Lcom/google/protobuf/UnknownFieldSet;)Lcom/google/protobuf/GeneratedMessage$Builder;
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:760)

ClassLoader failing seems reasonable given that Hadoop/Spark have a dependency on protobuf-java 2.5.0 which is incompatible with my 3.0.0-beta-1. I also noticed that protobufs (presumably versions < 3) have found their way into my jar in a few other places:

$ jar tf target/myjar-0.1-SNAPSHOT.jar | grep protobuf | grep '/$'
org/apache/hadoop/ipc/protobuf/
org/jboss/netty/handler/codec/protobuf/
META-INF/maven/com.google.protobuf/
META-INF/maven/com.google.protobuf/protobuf-java/
org/apache/mesos/protobuf/
io/netty/handler/codec/protobuf/
com/google/protobuf/
google/protobuf/

Is there something I can do (Maven Shade?) to sort this out?

Similar issue here: Spark java.lang.VerifyError


回答1:


Turns out this kinda thing is documented here: https://maven.apache.org/plugins/maven-shade-plugin/examples/class-relocation.html

Just need to relocate the protobuffers and the VerifyError goes away:

          <relocations>
            <relocation>
              <pattern>com.google.protobuf</pattern>
              <shadedPattern>shaded.com.google.protobuf</shadedPattern>
            </relocation>
          </relocations>



回答2:


Same solution as Dranxo's but with sbt assembly

assemblyShadeRules in assembly := Seq(
  ShadeRule.rename("com.google.protobuf.*" -> "shadedproto.@1").inProject
    .inLibrary("com.google.protobuf" % "protobuf-java" % protobufVersion)
)


来源:https://stackoverflow.com/questions/34487996/how-can-i-use-proto3-with-hadoop-spark

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!