Does SortValues transform Java SDK extension in Beam only run in hadoop environment?

怎甘沉沦 提交于 2019-12-29 09:17:09

问题


I have tried the example code of SortValues transform using DirectRunner on local machine (Windows)

PCollection<KV<String, KV<String, Integer>>> input = ...

PCollection<KV<String, Iterable<KV<String, Integer>>>> grouped =
input.apply(GroupByKey.<String, KV<String, Integer>>create());

PCollection<KV<String, Iterable<KV<String, Integer>>>> groupedAndSorted =
grouped.apply(SortValues.<String, String, Integer>create(BufferedExternalSorter.options()));

but I got the error PipelineExecutionException: java.lang.NoClassDefFoundError: org/apache/hadoop/io/Writable. Does this mean this transform function only works in Hadoop environment?


回答1:


As of today, if you use Beam with release version below 2.0.0, you will have to add two hadoop dependencies in your maven pom file for this SortValues module to work.

  1. add hadoop-common version 2.7.3 or later
  2. add hadoop-mapreduce-client-core version 2.7.3 or later.

Otherwise, you will just need to use Beam with release version >= 2.0.0.



来源:https://stackoverflow.com/questions/45069550/does-sortvalues-transform-java-sdk-extension-in-beam-only-run-in-hadoop-environm

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!