How to manage conflicting DataProc Guava, Protobuf, and GRPC dependencies

时间秒杀一切 提交于 2019-12-07 05:58:28

问题


I am working on a scala Spark job which needs to use java library (youtube/vitess) which is dependent upon newer versions of GRPC (1.01), Guava (19.0), and Protobuf (3.0.0) than currently provided on the DataProc 1.1 image.

When running the project locally and building with maven, the correct versions of these dependencies are loaded an the job will run without issue. When submitting the job to DataProc, the DataProc version of these libraries are preferred and the job will reference class functions that cannot be resolved.

What is the recommended way of ensuring that the right version of a dependency's dependencies get loaded when submitting a Spark job on DataProc? I'm not in a position to rewrite components of this library to use the older versions of these packages that are being provided by DataProc.


回答1:


Recommended approach is to include all dependencies for your job into uber jar (created using Maven Shade plugin, for example) and relocate dependencies classes inside this uber jar to avoid conflicts with classes in libraries provided by Dataproc.

For reference, you can take a look at how this is done in Cloud Storage connector which is a part of Dataproc distribution.



来源:https://stackoverflow.com/questions/40498542/how-to-manage-conflicting-dataproc-guava-protobuf-and-grpc-dependencies

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!