I cannot achieve good parallelization of stream processing when the stream source is a Reader. Running the code below on a quad-core CPU I observe 3 cores being
This problem is to some extent fixed in Java-9 early access builds. The Files.lines was rewritten and now upon splitting it actually jumps into the middle of memory-mapped file. Here's the results on my machine (which has 4 HyperThreading cores = 8 hardware threads):
Java 8u60:
Start processing
Cores: 8
CPU time: 73,50 s
Real time: 36,54 s
CPU utilization: 25,15%
Java 9b82:
Start processing
Cores: 8
CPU time: 79,64 s
Real time: 10,48 s
CPU utilization: 94,95%
As you can see, both real time and CPU utilization is greatly improved.
This optimization has some limitations though. Currently it works only for several encodings (namely UTF-8, ISO_8859_1 and US_ASCII) as for arbitrary encoding you don't know exactly how line-break is encoded. It's limited to the files of no more than 2Gb size (due to limitations of MappedByteBuffer in Java) and of course does not work for some non-regular files (like character devices, named pipes which cannot be memory-mapped). In such cases the old implementation is used as the fallback.