stream().collect(Collectors.toSet()) vs stream().distinct().collect(Collectors.toList())

北战南征 提交于 2019-12-03 20:11:50
Eugene

While the answer is pretty obvious - don't bother with these details of speed and memory consumption for this little amount of elements and the fact that one returns a Set and the other a List; there are some interesting small details (interesting IMO).

Suppose you are streaming from a source that is already known to be distinct, in such a case your .distinct() operation will be a NO-OP; because there is no need to actually do anything.

If you are streaming from a List (which is by design ordered) and there are no intermediate operations (unordered for example) that change the order, .distinct() will be forced to preserve the order, by using a LinkedHashSet internally - pretty expensive.

If you are doing parallel processing, list.stream().collect(Collectors.toSet()) version will merge multiple HashSets (in 9 this has been slightly improved vs 8), .distinct() on the other hand, will spin a ConcurrentHashMap that will keep all the keys with a dummy Boolean.TRUE value (it's also doing something interesting to preserve the null that your stream might have - even this internally is handled differently in two cases)

A Set (typically HashSet) consumes more than a List (typically ArrayList), mainly because of the hashing table that it stores. But with so few elements, you will not get a noticeable difference in terms of memory consumption.
Instead, which you should care about is that these collectors return different things : a List and a Set that have their own specificities, particularly as as you access to their elements.
So use the way that matches to what you want to perform with this collection.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!