StreamEx grouping into lists returns an incorrect number of records

帅比萌擦擦* 提交于 2019-12-13 18:34:08

问题


The following code splits a stream of objects into chunks of 1000, processes them on materialisation and returns the total number of objects at the end.

In all cases the number returned is correct unless the stream size happens to be 1. In the case the stream size is 1, the number returned is 0.

Any help would be greatly appreciated. I have also had to hack the return call in the case there are no records in the stream to be 0. I'd like to fix this too.

AtomicInteger recordCounter = new AtomicInteger(0);
try (StreamEx<MyObject> stream = StreamEx.of(myObjects)) {
        stream.groupRuns((prev, next) -> recordCounter.incrementAndGet() % 1000 != 0)
              .forEach((chunk) ->
                      {
                          //... process each chunk
                      }
              );
    } catch(Exception e) {
        throw new MyRuntimeException("Failure streaming...", e);
    } finally {
        myObjects.close();
    }

return recordCounter.get() == 0 ? 0 : recordCounter.incrementAndGet();

回答1:


As JavaDoc says:

sameGroup - a non-interfering, stateless predicate to apply to the pair of adjacent elements which returns true for elements which belong to the same group.

The predicate must be stateless, which is not your case. You are misusing the method, that's why you cannot get an expected result. It works close to what you want purely by chance, you cannot rely on this behavior, it may change in future StreamEx versions.




回答2:


Originally counter was used to know when to split chunks and it is not reliable to count total number of objects. When stream has size 0 or 1 groupRuns function is not executed.

So you need another way to count objects. Instead of just consuming items in forEach you could return number of objects processed chunk.size() and sum them in the end

    AtomicInteger counter = new AtomicInteger(0);
    try (StreamEx<MyObject> stream = StreamEx.of(myObjects)) {
        return stream
                .groupRuns((prev, next) -> counter.incrementAndGet() % 1000 != 0)
                .mapToLong((chunk) -> {
                     //... process each chunk
                     return chunk.size();
                 })
                .sum();
    } catch(Exception e) {
        throw new MyRuntimeException("Failure streaming...", e);
    } finally {
        myObjects.close();
    }



回答3:


@Nazarii Bardiuk explained, why it doesn't work. I meet the similar requirements to split the stream before. So I forked it and made a few changes at: StreamEx-0.8.7. Here is a simple example:

int count = IntStreamEx.range(0, 10).boxed().splitToList(3).mapToInt(chunk -> {
    System.out.println(chunk);
    return chunk.size();
}).sum();

System.out.println(count);

If you're at the begin of your project, You can take a try and the code will be:

try (StreamEx<MyObject> stream = StreamEx.of(myObjects).onClose(() -> myObjects.close())) {
    return stream.splitToList(1000)
                 .mapToInt((chunk) -> {
                              //... process each chunk
                     return chunk.size();
                  }).sum();
}



回答4:


In the end I went with Guava's Iterators.partition() to split my stream of objects into chunks:

MutableInt recordCounter = new MutableInt();
try {
    Iterators.partition(myObjects.iterator(), 1000)
             .forEachRemaining((chunk) -> {
                      //process each chunk
                      ...
                      recordCounter.add(chunk.size());
             });
} catch (Exception e) {
    throw new MyRuntimeException("Failure streaming...", e);
} finally {
    myObjects.close();
}

return recordCounter.getValue();


来源:https://stackoverflow.com/questions/45649990/streamex-grouping-into-lists-returns-an-incorrect-number-of-records

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!