Lazy sorting of entities in Java 8 Stream API on a daily basis?

一世执手 提交于 2019-12-06 07:14:11

问题


I have a large Java 8 Stream (Stream<MyObject>) with objects that looks like this:

class MyObject {
   private String string;
   private Date timestamp;

   // Getters and setter removed from brevity 
}

I know that all timestamps for day 1 will arrive before those in day 2 but within each day the timestamps could be out of order. I'd like to sort the MyObject's in timestamp order on a per daily basis using the Stream API. Since the Stream is large I have to do this as lazily as possible, i.e. it would be OK to hold one days worth of MyObject's in memory but it would not be OK to hold much more than that.

How can I achieve this?

Update 2017-04-29:

A requirement is that I want to continue working on the same stream after the sorting! I'd like something like this (pseudo code):

Stream<MyObject> sortedStream = myStreamUnsorted().sort(onADailyBasis());

回答1:


I'd suggest the following solution:

Store each value of your stream in a TreeMap to get it immediately sorted. As key use the object's timestamp.

 Map<Date, MyObject> objectsOfTheDaySorted = new TreeMap<>();

We need to know which object has to be removed from the map at the end. It'll be only one object but the member to store it in has to be (effectively) final. So I chose a plain list.

 List<MyObject> lastObject = new ArrayList<>();

Set the current day as integer.

 // just an example
 int currentDay = 23;

Use a predicate that determins whether the currentDay and the day of any passing by object don't match.

 Predicate<MyObject> predicate = myObject -> myObject.getTimestamp()
                    .toInstant()
                    .atZone(ZoneId.systemDefault())
                    .toLocalDate()
                    .getDayOfMonth() != currentDay;

Now stream your stream. Use peek() twice. First to put the object into the map. Second to overwrite the object in the list. Use anyMatch() as terminal operation and hand in the formerly created predicate. As soon as the first object appears that matches the criteria beeing from the next day, anyMatch() terminates the stream and returns true.

 stream.peek(myObject -> objectsOfTheDaySorted.put(myObject.getTimestamp(), myObject))
       .peek(myObject -> lastObject.set(0, myObject))
       .anyMatch(predicate);

Now you only have to remove the last passing by object which belongs already to the next day and therefore not to your map.

 objectsOfTheDaySorted.remove(lastObject.get(0).getTimestamp());

Done. You have a sorted Map of Objects that all belong to just one day. Hope this matches your expectations. Please find below the entire code in one block to get it better copied at once.

 Map<Date, MyObject> objectsOfTheDaySorted = new TreeMap<>();
 List<MyObject> lastObject = new ArrayList<>();

 // just an example
 int currentDay = 23;

 Predicate<MyObject> predicate = myObject -> myObject.getTimestamp()
                    .toInstant()
                    .atZone(ZoneId.systemDefault())
                    .toLocalDate()
                    .getDayOfMonth() != currentDay;

 stream.peek(myObject -> objectsOfTheDaySorted.put(myObject.getTimestamp(), myObject))
       .peek(myObject -> lastObject.set(0, myObject))
       .anyMatch(predicate);

 objectsOfTheDaySorted.remove(lastObject.get(0).getTimestamp());



回答2:


It depends whether you need to process the objects of all days or one specific day.

Building on DiabolicWords's answer, this is an example to process all days:

TreeSet<MyObject> currentDaysObjects = new TreeSet<>(Comparator.comparing(MyObject::getTimestamp));
LocalDate[] currentDay = new LocalDate[1];
incoming.peek(o -> {
    LocalDate date = o.getTimestamp().toInstant().atZone(ZoneId.systemDefault()).toLocalDate();
    if (!date.equals(currentDay[0]))
    {
        if (currentDay != null)
        {
            processOneDaysObjects(currentDaysObjects);
            currentDaysObjects.clear();
        }
        currentDay[0] = date;
    }
}).forEach(currentDaysObjects::add);

This will collect the objects for one day, process them, reset the collection and continue with the next day.

If you only want one specific day:

TreeSet<MyObject> currentDaysObjects = new TreeSet<>(Comparator.comparing(MyObject::getTimestamp));
LocalDate specificDay = LocalDate.now();
incoming.filter(o -> !o.getTimestamp()
                       .toInstant()
                       .atZone(ZoneId.systemDefault())
                       .toLocalDate()
                       .isBefore(specificDay))
        .peek(o -> currentDaysObjects.add(o))
        .anyMatch(o -> {
            if (o.getTimestamp().toInstant().atZone(ZoneId.systemDefault()).toLocalDate().isAfter(specificDay))
            {
                currentDaysObjects.remove(o);
                return true;
            }
            return false;
        });

The filter will skip objects from before the specificDay, and the anyMatch will terminate the stream after the specificDay.

I have read that there will be methods like skipWhile or takeWhile on streams with Java 9. These would make this a lot easier.

Edit after Op specified goal more in detail

Wow, this is a nice exercise, and quite a tough nut to crack. The problem is that an obvious solution (collecting the stream) always goes through the whole stream. You cannot take the next x elements, order them, stream them, then repeat without doing it for the whole stream (i.e. all days) at once. For the same reason, calling sorted() on a stream will go through it completely (especially as the stream does not know the fact that the elements are sorted by days already). For reference, read this comment here: https://stackoverflow.com/a/27595803/7653073.

As they recommend, here is an Iterator implementation wrapped in a stream that kind of looks ahead in the original stream, takes the elements of one day, sorts them, and gives you the whole thing in a nice new stream (without keeping all days in memory!). The implementation is more complicated as we do not have a fixed chunk size, but always have to find the first element of the next next day to know when to stop.

public class DayByDayIterator implements Iterator<MyObject>
{
    private Iterator<MyObject> incoming;
    private MyObject next;

    private Iterator<MyObject> currentDay;

    private MyObject firstOfNextDay;
    private Set<MyObject> nextDaysObjects = new TreeSet<>(Comparator.comparing(MyObject::getTimestamp));

    public static Stream<MyObject> streamOf(Stream<MyObject> incoming)
    {
        Iterable<MyObject> iterable = () -> new DayByDayIterator(incoming);
        return StreamSupport.stream(iterable.spliterator(), false);
    }

    private DayByDayIterator(Stream<MyObject> stream)
    {
        this.incoming = stream.iterator();
        firstOfNextDay = incoming.next();
        nextDaysObjects.add(firstOfNextDay);
        next();
    }

    @Override
    public boolean hasNext()
    {
        return next != null;
    }

    @Override
    public MyObject next()
    {
        if (currentDay == null || !currentDay.hasNext() && incoming.hasNext())
        {
            nextDay();
        }

        MyObject result = next;

        if (currentDay != null && currentDay.hasNext())
        {
            this.next = currentDay.next();
        }
        else
        {
            this.next = null;
        }

        return result;
    }

    private void nextDay()
    {
        while (incoming.hasNext()
                && firstOfNextDay.getTimestamp().toLocalDate()
                .isEqual((firstOfNextDay = incoming.next()).getTimestamp().toLocalDate()))
        {
            nextDaysObjects.add(firstOfNextDay);
        }
        currentDay = nextDaysObjects.iterator();

        if (incoming.hasNext())
        {
            nextDaysObjects = new TreeSet<>(Comparator.comparing(MyObject::getTimestamp));
            nextDaysObjects.add(firstOfNextDay);
        }
    }
}

Use it like this:

public static void main(String[] args)
{
    Stream<MyObject> stream = Stream.of(
            new MyObject(LocalDateTime.now().plusHours(1)),
            new MyObject(LocalDateTime.now()),
            new MyObject(LocalDateTime.now().plusDays(1).plusHours(2)),
            new MyObject(LocalDateTime.now().plusDays(1)),
            new MyObject(LocalDateTime.now().plusDays(1).plusHours(1)),
            new MyObject(LocalDateTime.now().plusDays(2)),
            new MyObject(LocalDateTime.now().plusDays(2).plusHours(1)));

    DayByDayIterator.streamOf(stream).forEach(System.out::println);
}

------------------- Output -----------------

2017-04-30T17:39:46.353
2017-04-30T18:39:46.333
2017-05-01T17:39:46.353
2017-05-01T18:39:46.353
2017-05-01T19:39:46.353
2017-05-02T17:39:46.353
2017-05-02T18:39:46.353

Explanation: currentDay and next are the basis for the iterator, while firstOfNextDay and nextDaysObjects already look at the first element of the next day. When currentDay is exhausted, nextDay() is called and continues adding incoming's element to nextDaysObjects until the next next day is reached, then turns nextDaysObjects into currentDay.

One thing: If the incoming stream is null or empty, it will fail. You can test for null, but the empty case requires to catch an Exception in the factory method. I did not want to add this for readability.

I hope this is what you need, let me know how it goes.




回答3:


If you consider an iterative approach, I think it becomes much simpler:

TreeSet<MyObject> currentDayObjects = new TreeSet<>(Comparator.comparing(MyObject::getTimestamp));
LocalDate currentDay = null;
for (MyObject m: stream::iterator) {
    LocalDate objectDay = m.getTimestamp().toInstant().atZone(ZoneId.systemDefault()).toLocalDate();
    if (currentDay == null) {
        currentDay = objectDay;
    } else if (!currentDay.equals(objectDay)) {
        // process a whole day of objects at once
        process(currentDayObjects);
        currentDay = objectDay;
        currentDayObjects.clear();
    }
    currentDayObjects.add(m);
}
// process the data of the last day
process(currentDayObjects);


来源:https://stackoverflow.com/questions/43571256/lazy-sorting-of-entities-in-java-8-stream-api-on-a-daily-basis

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!