问题
I have tried the example code of SortValues transform using DirectRunner
on local machine (Windows)
PCollection<KV<String, KV<String, Integer>>> input = ...
PCollection<KV<String, Iterable<KV<String, Integer>>>> grouped =
input.apply(GroupByKey.<String, KV<String, Integer>>create());
PCollection<KV<String, Iterable<KV<String, Integer>>>> groupedAndSorted =
grouped.apply(SortValues.<String, String, Integer>create(BufferedExternalSorter.options()));
but I got the error PipelineExecutionException: java.lang.NoClassDefFoundError: org/apache/hadoop/io/Writable
. Does this mean this transform function only works in Hadoop environment?
回答1:
As of today, if you use Beam with release version below 2.0.0, you will have to add two hadoop dependencies in your maven pom file for this SortValues module to work.
- add
hadoop-common
version 2.7.3 or later - add
hadoop-mapreduce-client-core
version 2.7.3 or later.
Otherwise, you will just need to use Beam with release version >= 2.0.0.
来源:https://stackoverflow.com/questions/45069550/does-sortvalues-transform-java-sdk-extension-in-beam-only-run-in-hadoop-environm