Why does C++11/Boost `unordered_map` not rehash when erasing?

心已入冬 提交于 2019-12-05 09:27:50

As far as I can tell, that behavior is not so much a result of the requirement to not invalidate iterators (std::unordered_map::rehash also doesn't invalidate them) than a result of the complexity requirement for std::unordered_map::erase, which should take constant time on average.

I can't tell you, why it was specified like this, but I can tell you, why it is the right default behavior for me:

  1. In many applications, the content of my hash table is virtually constant after initialization anyway - so here I don't care.
  2. Where this is not the case, at least the average number of elements stays more or less the same (within an order of magnitude). So even if a lot of objects are deleted at some point in time, new elements will probably be added soon afterwards. In that case, it wouldn't really reduce the memory footprint and the overhead of rehashing two times (once after deletion and once after adding new elements) would usually outweigh any performance improvement I might get through a more compact table.
  3. Erasing a larger number of elements (e.g. by a filter function) would be severely slowed down by intermediate rehashes, if you could not control the heuristic (as you can when inserting elements by modifying max_load_factor).
    So finally, even in those cases where it is actually beneficial to rehash, I can usually make a better decision, about when to do it (e.g. via rehash or copy and swap) than a generic heuristic in std::unordere_map could.

Again, those points are true for my typical use cases, I don't claim that they are universally true for other people's software or that they were the motivation behind the specification of unordered_map

Interestingly, VS2015 and libstc++ seem to implement rehash(0) differently *:

  • libstc++ will actually shrink (reallocate) the memory where the table is stored
  • VS2015 will decrease the table size (a.k.a. bucket number) but not reallocate the table. So even after rehashing an empty hash map, the surplus memory for the table will not be returned.

Apparently, the only portable way to minimize the memory footprint is to copy and swap.

Concerning the documentation, I agree that this should probably be mentioned explicitly somewhere, but on the other hand it is e.g. consistent with the documentation of std::vector::erase(). Also I'm not 100% sure, if it is really impossible to write an implementation that rehashes on erase at least sometimes, without violating the requirements.


*) I inferred this from the results of bucket_count and getAllocatedBytes() from your allocator, not by actually looking at the source code.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!