The HashSet<T>.removeAll method is surprisingly slow

妖精的绣舞 提交于 2019-11-28 16:37:45

The behaviour is (somewhat) documented in the javadoc:

This implementation determines which is the smaller of this set and the specified collection, by invoking the size method on each. If this set has fewer elements, then the implementation iterates over this set, checking each element returned by the iterator in turn to see if it is contained in the specified collection. If it is so contained, it is removed from this set with the iterator's remove method. If the specified collection has fewer elements, then the implementation iterates over the specified collection, removing from this set each element returned by the iterator, using this set's remove method.

What this means in practice, when you call source.removeAll(removals);:

  • if the removals collection is of a smaller size than source, the remove method of HashSet is called, which is fast.

  • if the removals collection is of equal or larger size than the source, then removals.contains is called, which is slow for an ArrayList.

Quick fix:

Collection<Integer> removals = new HashSet<Integer>();

Note that there is an open bug that is very similar to what you describe. The bottom line seems to be that it is probably a poor choice but can't be changed because it is documented in the javadoc.


For reference, this is the code of removeAll (in Java 8 - haven't checked other versions):

public boolean removeAll(Collection<?> c) {
    Objects.requireNonNull(c);
    boolean modified = false;

    if (size() > c.size()) {
        for (Iterator<?> i = c.iterator(); i.hasNext(); )
            modified |= remove(i.next());
    } else {
        for (Iterator<?> i = iterator(); i.hasNext(); ) {
            if (c.contains(i.next())) {
                i.remove();
                modified = true;
            }
        }
    }
    return modified;
}
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!