Why are dict lookups always better than list lookups?

后端未结

关注

 3  1630

说谎 2020-12-25 11:21

I was using a dictionary as a lookup table but I started to wonder if a list would be better for my application -- the amount of entries in my lookup table wasn\'t that big.

3条回答

一向 (楼主)

2020-12-25 12:12
The short answer is that lists use linear search and dicts use amortized O(1) search.

In addition, dict searches can skip an equality test either when 1) hash values don't match or 2) when there is an identity match. Lists only benefit from the identity-implies equality optimization.

Back in 2008, I gave a talk on this subject where you'll find all the details: https://www.youtube.com/watch?v=hYUsssClE94

Roughly the logic for searching lists is:
```
for element in s:
    if element is target:
        # fast check for identity implies equality
        return True
    if element == target:
        # slower check for actual equality
        return True
return False
```
For dicts the logic is roughly:
```
h = hash(target)
for i in probe_sequence(h, len(table)):
    element = key_table[i]
    if element is UNUSED:
        raise KeyError(target)
    if element is target:
        # fast path for identity implies equality
        return value_table[i]
    if h != h_table[i]:
        # unequal hashes implies unequal keys
        continue
    if element == target:
        # slower check for actual equality
        return value_table[i]
```
Dictionary hash tables are typically between one-third and two-thirds full, so they tend to have few collisions (few trips around the loop shown above) regardless of size. Also, the hash value check prevents needless slow equality checks (the chance of a wasted equality check is about 1 in 2**64).

If your timing focuses on integers, there are some other effects at play as well. That hash of a int is the int itself, so hashing is very fast. Also, it means that if you're storing consecutive integers, there tend to be no collisions at all.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...