Is it always faster to use string as key in a dict?

后端 未结 2 815
Happy的楠姐
Happy的楠姐 2020-12-09 16:15

On this page, I see something interesting:

Note that there is a fast-path for dicts that (in practice) only deal with str keys; this doesn\'t affect the algorith

相关标签:
2条回答
  • 2020-12-09 17:11

    The C code that underlies the Python dict is optimisted for String keys. You can read about this here (and in the book the blog refers to).

    If the Python runtime knows your dict only contains string keys it can do things such as not cater for errors that won't happen with a string to string comparison and ignore the rich comparison operators. This will make the common case of the string key only dict a little faster. (Update: timing shows it to be more than a little.)

    However, it is unlikely that this would make a significant change to the run time of most Python programs. Only worry about this optimisation if you have measured and found dict lookups to be a bottleneck in your code. As the famous quote says, "Premature optimization is the root of all evil."

    The only way to see how much faster things really are, is to time them:

    >>> timeit.timeit('a["500"]','a ={}\nfor i in range(1000): a[str(i)] = i')
    0.06659698486328125
    >>> timeit.timeit('a[500]','a ={}\nfor i in range(1000): a[i] = i')
    0.09005999565124512
    

    So using string keys is about 30% faster even compared to int keys, and I have to admit I was surprised at the size of the difference.

    0 讨论(0)
  • 2020-12-09 17:15

    As this only affects the constant time, it's likely not to matter at all. The only time you really need to optimise is when you are working with very large data sets - which this does nothing to affect.

    What this does mean is that in the cases where you have small dictionaries with strings as keys, Python will be quick - this is a common usage, so it's been optimised for.

    As Ignacio Vazquez-Abrams points out, it's likely that converting your key to a string will cost (far) more than the slight boost you might gain from it being a string for the dict.

    In short use what is relevant to your situation - optimisation should only be done where there is a need for it, not before.

    Some tests:

    python -m timeit -s "a={key: 1 for key in range(1000)}" "a[500]"
    10000000 loops, best of 3: 0.0773 usec per loop
    
    python -m timeit -s "a={str(key): 1 for key in range(1000)}" "a[\"500\"]"
    10000000 loops, best of 3: 0.0452 usec per loop
    
    python -m timeit -s "a={str(key): 1 for key in range(1000)}" "a[str(500)]"
    1000000 loops, best of 3: 0.244 usec per loop
    

    As you can see, while the string-based dict is faster, converting the key is very expensive by comparison, totally mitigating the gain (and then some).

    So yes, if the data you are using is only being used as keys to the dictionary, and what format your store them in doesn't matter, then strings are preferable, in a small dictionary. In practice, that is a very rare case (and you'd probably be using strings already).

    0 讨论(0)
提交回复
热议问题