Should I return a Collection or a Stream?

Suppose I have a method that returns a read-only view into a member list:

class Team {
    private List < Player > players = new ArrayList < > ();

    // ...

    public List < Player > getPlayers() {
        return Collections.unmodifiableList(players);
    }
}

Further suppose that all the client does is iterate over the list once, immediately. Maybe to put the players into a JList or something. The client does not store a reference to the list for later inspection!

Given this common scenario, should I return a stream instead?

public Stream < Player > getPlayers() {
    return players.stream();
}

Or is returning a stream non-idiomatic in Java? Were streams designed to always be "terminated" inside the same expression they were created in?

The answer is, as always, "it depends". It depends on how big the returned collection will be. It depends on whether the result changes over time, and how important consistency of the returned result is. And it depends very much on how the user is likely to use the answer.

First, note that you can always get a Collection from a Stream, and vice versa:

// If API returns Collection, convert with stream()
getFoo().stream()...

// If API returns Stream, use collect()
Collection<T> c = getFooStream().collect(toList());

So the question is, which is more useful to your callers.

If your result might be infinite, there's only one choice: Stream.

If your result might be very large, you probably prefer Stream, since there may not be any value in materializing it all at once, and doing so could create significant heap pressure.

If all the caller is going to do is iterate through it (search, filter, aggregate), you should prefer Stream, since Stream has these built-in already and there's no need to materialize a collection (especially if the user might not process the whole result.) This is a very common case.

Even if you know that the user will iterate it multiple times or otherwise keep it around, you still may want to return a Stream instead, for the simple fact that whatever Collection you choose to put it in (e.g., ArrayList) may not be the form they want, and then the caller has to copy it anyway. if you return a stream, they can do collect(toCollection(factory)) and get it in exactly the form they want.

The above "prefer Stream" cases mostly derive from the fact that Stream is more flexible; you can late-bind to how you use it without incurring the costs and constraints of materializing it to a Collection.

The one case where you must return a Collection is when there are strong consistency requirements, and you have to produce a consistent snapshot of a moving target. Then, you will want put the elements into a collection that will not change.

So I would say that most of the time, Stream is the right answer -- it is more flexible, it doesn't impose usually-unnecessary materialization costs, and can be easily turned into the Collection of your choice if needed. But sometimes, you may have to return a Collection (say, due to strong consistency requirements), or you may want to return Collection because you know how the user will be using it and know this is the most convenient thing for them.

I have a few points to add to Brian Goetz' excellent answer.

It's quite common to return a Stream from a "getter" style method call. See the Stream usage page in the Java 8 javadoc and look for "methods... that return Stream" for the packages other than java.util.Stream. These methods are usually on classes that represent or can contain multiple values or aggregations of something. In such cases, APIs typically have returned collections or arrays of them. For all the reasons that Brian noted in his answer, it's very flexible to add Stream-returning methods here. Many of these classes have collections- or array-returning methods already, because the classes predate the Streams API. If you're designing a new API, and it makes sense to provide Stream-returning methods, it might not be necessary to add collection-returning methods as well.

Brian mentioned the cost of "materializing" the values into a collection. To amplify this point, there are actually two costs here: the cost of storing values in the collection (memory allocation and copying) and also the cost of creating the values in the first place. The latter cost can often be reduced or avoided by taking advantage of a Stream's laziness-seeking behavior. A good example of this are the APIs in java.nio.file.Files:

static Stream<String>  lines(path)
static List<String>    readAllLines(path)

Not only does readAllLines have to hold the entire file contents in memory in order to store it into the result list, it also has to read the file to the very end before it returns the list. The lines method can return almost immediately after it has performed some setup, leaving file reading and line breaking until later when it's necessary -- or not at all. This is a huge benefit, if for example, the caller is interested only in the first ten lines:

try (Stream<String> lines = Files.lines(path)) {
    List<String> firstTen = lines.limit(10).collect(toList());
}

Of course considerable memory space can be saved if the caller filters the stream to return only lines matching a pattern, etc.

An idiom that seems to be emerging is to name stream-returning methods after the plural of the name of the things that it represents or contains, without a get prefix. Also, while stream() is a reasonable name for a stream-returning method when there is only one possible set of values to be returned, sometimes there are classes that have aggregations of multiple types of values. For example, suppose you have some object that contains both attributes and elements. You might provide two stream-returning APIs:

Stream<Attribute>  attributes();
Stream<Element>    elements();

Were streams designed to always be "terminated" inside the same expression they were created in?

That is how they are used in most examples.

Note: returning a Stream is not that different to returning a Iterator (admitted with much more expressive power)

IMHO the best solution is to encapsulate why you are doing this, and not return the collection.

e.g.

public int playerCount();
public Player player(int n);

or if you intend to count them

public int countPlayersWho(Predicate<? super Player> test);

If the stream is finite, and there is an expected/normal operation on the returned objects which will throw a checked exception, I always return a Collection. Because if you are going to be doing something on each of the objects that can throw a check exception, you will hate the stream. One real lack with streams i there inability to deal with checked exceptions elegantly.

Now, perhaps that is a sign that you don't need the checked exceptions, which is fair, but sometimes they are unavoidable.

In contrast to collections, streams have additional characteristics. A stream returned by any method might be:

finite or infinite
parallel or sequential (with a default globally shared threadpool that can impact any other part of an application)
ordered or non-ordered

These differences also exists in collections, but there they are part of the obvious contract:

All Collections have size, Iterator/Iterable can be infinite.
Collections are explicitly ordered or non-ordered
Parallelity is thankfully not something the collection care about beyond thread-safety.

As a consumer of a stream (either from a method return or as a method parameter) this is a dangerous and confusing situation. To make sure their algorithm behaves correctly, consumers of streams need to make sure the algorithm makes no wrong assumption about the stream characteristics. And that is a very hard thing to do. In unit testing, that would mean that you have to multiply all your tests to be repeated with the same stream contents, but with streams that are

(finite, ordered, sequential)
(finite, ordered, parallel)
(finite, non-ordered, sequential)...

Writing method guards for streams that throw an IllegalArgumentException if the input stream has a characteristics breaking your algorithm is difficult, because the properties are hidden.

That leaves Stream only as a valid choice in a method signature when none of the problems above matter, which is rarely the case.

It is much safer to use other datatypes in method signatures with an explicit contract (and without implicit thread-pool processing involved) that makes it impossible to accidentally process data with wrong assumptions about orderedness, sizedness or parallelity (and threadpool usage).

I think it depends on your scenario. May be, if you make your Team implement Iterable<Player>, it is sufficient.

for (Player player : team) {
    System.out.println(player);
}

or in the a functional style:

team.forEach(System.out::println);

But if you want a more complete and fluent api, a stream could be a good solution.

Perhaps a Stream factory would be a better choice. The big win of only exposing collections via Stream is that it better encapsulates your domain model’s data structure. It’s impossible for any use of your domain classes to affect the inner workings of your List or Set simply by exposing a Stream.

It also encourages users of your domain class to write code in a more modern Java 8 style. It’s possible to incrementally refactor to this style by keeping your existing getters and adding new Stream-returning getters. Over time, you can rewrite your legacy code until you’ve finally deleted all getters that return a List or Set. This kind of refactoring feels really good once you’ve cleared out all the legacy code!

I would probably have 2 methods, one to return a Collection and one to return the collection as a Stream.

class Team
{
    private List<Player> players = new ArrayList<>();

// ...

    public List<Player> getPlayers()
    {
        return Collections.unmodifiableList(players);
    }

    public Stream<Player> getPlayerStream()
    {
        return players.stream();
    }

}

This is the best of both worlds. The client can choose if they want the List or the Stream and they don't have to do the extra object creation of making an immutable copy of the list just to get a Stream.

This also only adds 1 more method to your API so you don't have too many methods

来源：https://stackoverflow.com/questions/24676877/should-i-return-a-collection-or-a-stream

标签

java

collections

java-8

encapsulation

java-stream