What's Up with O(1)?

前端 未结 13 2212
既然无缘
既然无缘 2020-12-22 17:40

I have been noticing some very strange usage of O(1) in discussion of algorithms involving hashing and types of search, often in the context of using a dictionary type provi

13条回答
  •  [愿得一人]
    2020-12-22 18:00

    Yes, garbage collection does affect the asymptotic complexity of algorithms running in the garbage collected arena. It is not without cost, but it is very hard to analyze without empirical methods, because the interaction costs are not compositional.

    The time spent garbage collecting depends on the algorithm being used. Typically modern garbage collectors toggle modes as memory fills up to keep these costs under control. For instance, a common approach is to use a Cheney style copy collector when memory pressure is low because it pays cost proportional to the size of the live set in exchange for using more space, and to switch to a mark and sweep collector when memory pressure becomes greater, because even though it pays cost proportional to the live set for marking and to the whole heap or dead set for sweeping. By the time you add card-marking and other optimizations, etc. the worst case costs for a practical garbage collector may actually be a fair bit worse, picking up an extra logarithmic factor for some usage patterns.

    So, if you allocate a big hash table, even if you access it using O(1) searches for all time during its lifetime, if you do so in a garbage collected environment, occasionally the garbage collector will traverse the entire array, because it is size O(n) and you will pay that cost periodically during collection.

    The reason we usually leave it off of the complexity analysis of algorithms is that garbage collection interacts with your algorithm in non-trivial ways. How bad of a cost it is depends a lot on what else you are doing in the same process, so the analysis is not compositional.

    Moreover, above and beyond the copy vs. compact vs. mark and sweep issue, the implementation details can drastically affect the resulting complexities:

    1. Incremental garbage collectors that track dirty bits, etc. can all but make those larger re-traversals disappear.
    2. It depends on whether your GC works periodically based on wall-clock time or runs proportional to the number of allocations.
    3. Whether a mark and sweep style algorithm is concurrent or stop-the-world
    4. Whether it marks fresh allocations black if it leaves them white until it drops them into a black container.
    5. Whether your language admits modifications of pointers can let some garbage collectors work in a single pass.

    Finally, when discussing an algorithm, we are discussing a straw man. The asymptotics will never fully incorporate all of the variables of your environment. Rarely do you ever implement every detail of a data structure as designed. You borrow a feature here and there, you drop a hash table in because you need fast unordered key access, you use a union-find over disjoint sets with path compression and union by rank to merge memory-regions over there because you can't afford to pay a cost proportional to the size of the regions when you merge them or what have you. These structures are thought primitives and the asymptotics help you when planning overall performance characteristics for the structure 'in-the-large' but knowledge of what the constants are matters too.

    You can implement that hash table with perfectly O(1) asymptotic characteristics, just don't use garbage collection; map it into memory from a file and manage it yourself. You probably won't like the constants involved though.

提交回复
热议问题