I watched a talk by José Paumard on InfoQ : http://www.infoq.com/fr/presentations/jdk8-lambdas-streams-collectors (French)
The thing is I got stuck on this one poin
JDK docs has good explanation of this behavior, it is ordering constraint that kills performance for parallel processing
Text from doc for limit function - https://docs.oracle.com/javase/8/docs/api/java/util/stream/LongStream.html
While limit() is generally a cheap operation on sequential stream pipelines, it can be quite expensive on ordered parallel pipelines, especially for large values of maxSize, since limit(n) is constrained to return not just any n elements, but the first n elements in the encounter order. Using an unordered stream source (such as generate(LongSupplier)) or removing the ordering constraint with BaseStream.unordered() may result in significant speedups of limit() in parallel pipelines, if the semantics of your situation permit. If consistency with encounter order is required, and you are experiencing poor performance or memory utilization with limit() in parallel pipelines, switching to sequential execution with sequential() may improve performance. Blockquote