Reader#lines() parallelizes badly due to nonconfigurable batch size policy in its spliterator

后端 未结 4 1559
小鲜肉
小鲜肉 2020-11-30 08:51

I cannot achieve good parallelization of stream processing when the stream source is a Reader. Running the code below on a quad-core CPU I observe 3 cores being

4条回答
  •  爱一瞬间的悲伤
    2020-11-30 09:28

    This problem is to some extent fixed in Java-9 early access builds. The Files.lines was rewritten and now upon splitting it actually jumps into the middle of memory-mapped file. Here's the results on my machine (which has 4 HyperThreading cores = 8 hardware threads):

    Java 8u60:

    Start processing
              Cores: 8
           CPU time: 73,50 s
          Real time: 36,54 s
    CPU utilization: 25,15%
    

    Java 9b82:

    Start processing
              Cores: 8
           CPU time: 79,64 s
          Real time: 10,48 s
    CPU utilization: 94,95%
    

    As you can see, both real time and CPU utilization is greatly improved.

    This optimization has some limitations though. Currently it works only for several encodings (namely UTF-8, ISO_8859_1 and US_ASCII) as for arbitrary encoding you don't know exactly how line-break is encoded. It's limited to the files of no more than 2Gb size (due to limitations of MappedByteBuffer in Java) and of course does not work for some non-regular files (like character devices, named pipes which cannot be memory-mapped). In such cases the old implementation is used as the fallback.

提交回复
热议问题