Java 8 Stream Filter - Sort based pdate

问题

Am trying to sort the filed in filter.

Input Document / Sample Record:

DocumentList: [
    Document{
        {
            _id=5975ff00a213745b5e1a8ed9,
            u_id=,
            mailboxcontent_id=5975ff00a213745b5e1a8ed8,                
            idmapping=Document{
                {ptype=PDF, cid=00988, normalizedcid=00988, systeminstanceid=, sourceschemaname=, pid=0244810006}
            },
            batchid=null,
            pdate=Tue Jul 11 17:52:25 IST 2017, locale=en_US
        }
    },
    Document{
        {
            _id=597608aba213742554f537a6,
            u_id=,
            mailboxcontent_id=597608aba213742554f537a3, 
            idmapping=Document{
                {platformtype=PDF, cid=00999, normalizedcid=00999, systeminstanceid=, sourceschemaname=, pid=0244810006}
            },
            batchid=null,
            pdate=Fri Jul 28 01:26:22 IST 2017,
            locale=en_US
        }
    }
]

Here, I need to sort based on pdate.

List<Document> outList = documentList.stream()
    .filter(p -> p.getInteger(CommonConstants.VISIBILITY) == 1)
    .parallel()
    .sequential()
    .collect(Collectors.toCollection(ArrayList::new))
    .sort()
    .skip(skipValue)
    .limit(limtValue);

Not sure how to sort

"order by pdate DESC"

Thank you in advance!

回答1:

You can use .sorted() Stream API method:

.sorted(Comparator.comparing(Document::getPDate).reversed())

And the full, refactored example:

List<Document> outList = documentList.stream()
  .filter(p -> p.getInteger(CommonConstants.VISIBILITY) == 1)
  .sorted(Comparator.comparing(Document::getPDate).reversed())
  .skip(skipValue).limit(limtValue)
  .collect(Collectors.toCollection(ArrayList::new))

Few things to remember about:

If you do not care about the List implementation, use Collectors.toList()
The collect() is a terminal operation and should be called as the last operation
.parallel().sequential() this is totally useless - if you want to parallelize, stick to .parallel() if not, do not write anything, streams are sequential by default
The whole Stream will be loaded to the memory for the sake of sorting

回答2:

Alternative approach to pivovarit's answer, which might be useful in case your dataset is potentially too big to hold in memory at once (sorted Streams have to maintain whole underlying dataset in intermediate container to provide ability to sort it properly).

We will not utilize stream sort operation here: instead, we will use data structure that will hold as many elements in set as we told it to, and will push out extra elements based on sort criteria (I do not claim to provide best implementation here, just the idea of it).

To achieve this, we need custom collector:

class SortedPileCollector<E> implements Collector<E, SortedSet<E>, List<E>> {
  int maxSize;
  Comparator<E> comptr;

  public SortedPileCollector(int maxSize, Comparator<E> comparator) {
    if (maxSize < 1) {
      throw new IllegalArgumentException("Max size cannot be " + maxSize);
    }
    this.maxSize = maxSize;
    comptr = Objects.requireNonNull(comparator);
  }

  public Supplier<SortedSet<E>> supplier() {
    return () -> new TreeSet<>(comptr);
  }

  public BiConsumer<SortedSet<E>, E> accumulator() {
    return this::accumulate; // see below
  }

  public BinaryOperator<SortedSet<E>> combiner() {
    return this::combine;
  }

  public Function<SortedSet<E>, List<E>> finisher() {
    return set -> new ArrayList<>(set);
  }

  public Set<Characteristics> characteristics() {
    return EnumSet.of(Characteristics.UNORDERED);
  }

  // The interesting part
  public void accumulate(SortedSet<E> set, E el) {
    Objects.requireNonNull(el);
    Objects.requireNonNull(set);
    if (set.size() < maxSize) {
      set.add(el);
    }
    else {
      if (set.contains(el)) {
        return; // we already have this element
      }
      E tailEl = set.last();
      Comparator<E> c = set.comparator();
      if (c.compare(tailEl, el) <= 0) {
        // If we did not have capacity, received element would've gone to the end of our set.
        // However, since we are at capacity, we will skip the element
        return;
      }
      else {
        // We received element that we should preserve.
        // Remove set tail and add our new element.
        set.remove(tailEl);
        set.add(el);
      }
    }
  }

  public SortedSet<E> combine(SortedSet<E> first, SortedSet<E> second) {
    SortedSet<E> result = new TreeSet<>(first);
    second.forEach(el -> accumulate(result, el)); // inefficient, but hopefully you see the general idea.
    return result;
  }
}

The above collector acts as mutable structure that manages sorted set of data. Note, that "duplicate" elements are ignored by this implementation - you will need to change implementation if you want to allow duplicates.

Use of this comparator for your case, assuming you want three top elements:

Comparator<Document> comparator = Comparator.comparing(Document::getPDate).reversed(); // see pivovarit's answer
List<Document> = documentList.stream()
  .filter(p -> p.getInteger(VISIBILITY) == 1)
  .collect(new SortedPileCollector<>(3, comparator));

回答3:

After you got the resulted list, do this assuming Document.getPDate() returns the pDate

Collections.sort(outList, Comparator.comparing(Document::getPDate).reversed());

来源：https://stackoverflow.com/questions/45421453/java-8-stream-filter-sort-based-pdate

标签

java

sorting

java-8

java-stream