What's Up with O(1)?

前端 未结 13 2246
既然无缘
既然无缘 2020-12-22 17:40

I have been noticing some very strange usage of O(1) in discussion of algorithms involving hashing and types of search, often in the context of using a dictionary type provi

13条回答
  •  夕颜
    夕颜 (楼主)
    2020-12-22 17:55

    Hashtables is a data structure that supports O(1) search and insertion.

    A hashtable usually has a key and value pair, where the key is used to as the parameter to a function (a hash function) which will determine the location of the value in its internal data structure, usually an array.

    As insertion and search only depends upon the result of the hash function and not on the size of the hashtable nor the number of elements stored, a hashtable has O(1) insertion and search.

    There is one caveat, however. That is, as the hashtable becomes more and more full, there will be hash collisions where the hash function will return an element of an array which is already occupied. This will necesitate a collision resolution in order to find another empty element.

    When a hash collision occurs, a search or insertion cannot be performed in O(1) time. However, good collision resolution algorithms can reduce the number of tries to find another suiteable empty spot or increasing the hashtable size can reduce the number of collisions in the first place.

    So, in theory, only a hashtable backed by an array with an infinite number of elements and a perfect hash function would be able to achieve O(1) performance, as that is the only way to avoid hash collisions that drive up the number of required operations. Therefore, for any finite-sized array will at one time or another be less than O(1) due to hash collisions.


    Let's take a look at an example. Let's use a hashtable to store the following (key, value) pairs:

    • (Name, Bob)
    • (Occupation, Student)
    • (Location, Earth)

    We will implement the hashtable back-end with an array of 100 elements.

    The key will be used to determine an element of the array to store the (key, value) pair. In order to determine the element, the hash_function will be used:

    • hash_function("Name") returns 18
    • hash_function("Occupation") returns 32
    • hash_function("Location") returns 74.

    From the above result, we'll assign the (key, value) pairs into the elements of the array.

    array[18] = ("Name", "Bob")
    array[32] = ("Occupation", "Student")
    array[74] = ("Location", "Earth")
    

    The insertion only requires the use of a hash function, and does not depend on the size of the hashtable nor its elements, so it can be performed in O(1) time.

    Similarly, searching for an element uses the hash function.

    If we want to look up the key "Name", we'll perform a hash_function("Name") to find out which element in the array the desired value resides.

    Also, searching does not depend on the size of the hashtable nor the number of elements stored, therefore an O(1) operation.

    All is well. Let's try to add an additional entry of ("Pet", "Dog"). However, there is a problem, as hash_function("Pet") returns 18, which is the same hash for the "Name" key.

    Therefore, we'll need to resolve this hash collision. Let's suppose that the hash collision resolving function we used found that the new empty element is 29:

    array[29] = ("Pet", "Dog")
    

    Since there was a hash collision in this insertion, our performance was not quite O(1).

    This problem will also crop up when we try to search for the "Pet" key, as trying to find the element containing the "Pet" key by performing hash_function("Pet") will always return 18 initially.

    Once we look up element 18, we'll find the key "Name" rather than "Pet". When we find this inconsistency, we'll need to resolve the collision in order to retrieve the correct element which contains the actual "Pet" key. Resovling a hash collision is an additional operation which makes the hashtable not perform at O(1) time.

提交回复
热议问题