Flink: DataSet.count() is bottleneck - How to count parallel?
问题 I am learning Map-Reduce using Flink and have a question about how to efficiently count elements in a DataSet. What I have so far is this: DataSet<MyClass> ds = ...; long num = ds.count(); When executing this, in my flink log it says 12/03/2016 19:47:27 DataSink (count())(1/1) switched to RUNNING So there is only one CPU used (i have four and other commands like reduce use all of them). I think count() internally collects the DataSet from all four CPUs and counts them sequentially instead of