How to sort a dataset in Apache Flink?

你离开我真会死。 提交于 2021-02-16 16:48:10

问题


I have a Tuple Dataset of the form DataSet>. I wish to sort the "entire" Dataset on field String and then get only the Long values in a file. Flink does provide sort-partition but that does not help here as I need to sort the Dataset completely.


回答1:


You can also use sortPartition() to sort the complete DataSet if you set the parallelism to 1:

DataSet<Tuple2<String, Long>> data = ...
DataSet<Tuple2<String, Long>> sorted = data
  .sortPartition(0, Order.ASCENDING).setParallelism(1); // sort in one partition
DataSet<Long> longs = sorted.map(new LongExtractor());  // map to extract long


来源:https://stackoverflow.com/questions/43156483/how-to-sort-a-dataset-in-apache-flink

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!