问题
I have a Tuple Dataset of the form DataSet>. I wish to sort the "entire" Dataset on field String and then get only the Long values in a file. Flink does provide sort-partition but that does not help here as I need to sort the Dataset completely.
回答1:
You can also use sortPartition()
to sort the complete DataSet
if you set the parallelism to 1
:
DataSet<Tuple2<String, Long>> data = ...
DataSet<Tuple2<String, Long>> sorted = data
.sortPartition(0, Order.ASCENDING).setParallelism(1); // sort in one partition
DataSet<Long> longs = sorted.map(new LongExtractor()); // map to extract long
来源:https://stackoverflow.com/questions/43156483/how-to-sort-a-dataset-in-apache-flink