Reader#lines() parallelizes badly due to nonconfigurable batch size policy in its spliterator

后端未结

关注

 4  1559

小鲜肉 2020-11-30 08:51

I cannot achieve good parallelization of stream processing when the stream source is a Reader. Running the code below on a quad-core CPU I observe 3 cores being

4条回答

爱一瞬间的悲伤 (楼主)

2020-11-30 09:28
This problem is to some extent fixed in Java-9 early access builds. The Files.lines was rewritten and now upon splitting it actually jumps into the middle of memory-mapped file. Here's the results on my machine (which has 4 HyperThreading cores = 8 hardware threads):

Java 8u60:
```
Start processing
          Cores: 8
       CPU time: 73,50 s
      Real time: 36,54 s
CPU utilization: 25,15%
```
Java 9b82:
```
Start processing
          Cores: 8
       CPU time: 79,64 s
      Real time: 10,48 s
CPU utilization: 94,95%
```
As you can see, both real time and CPU utilization is greatly improved.

This optimization has some limitations though. Currently it works only for several encodings (namely UTF-8, ISO_8859_1 and US_ASCII) as for arbitrary encoding you don't know exactly how line-break is encoded. It's limited to the files of no more than 2Gb size (due to limitations of MappedByteBuffer in Java) and of course does not work for some non-regular files (like character devices, named pipes which cannot be memory-mapped). In such cases the old implementation is used as the fallback.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...