Python: iterating over list vs over dict items efficiency

后端 未结 3 841
心在旅途
心在旅途 2020-12-03 21:40

Is iterating over some_dict.items() as efficient as iterating over a list of the same items in CPython?

相关标签:
3条回答
  • 2020-12-03 22:31

    Although iterating through some_list is 2x speedup than some_dict.items(), but iterating through some_list by index is almost as same as iterating through some_dict by key.

    K = 1000000
    some_dict = dict(zip(xrange(K), reversed(xrange(K))))
    some_list = zip(xrange(K), xrange(K))
    %timeit for t in some_list: t
    10 loops, best of 3: 55.7 ms per loop
    %timeit for i in xrange(len(some_list)):some_list[i]
    10 loops, best of 3: 94 ms per loop
    %timeit for key in some_dict: some_dict[key]
    10 loops, best of 3: 115 ms per loop
    %timeit for i,t in enumerate(some_list): t
    10 loops, best of 3: 103 ms per loop
    
    0 讨论(0)
  • 2020-12-03 22:33

    A little benchmark shows me that iterating a list is definately faster.

    def iterlist(list_):
        i = 0
        for _ in list_:
            i += 1
        return i
    
    def iterdict(dict_):
        i = 0
        for _ in dict_.iteritems():
            i += 1
        return i
    
    def noiterdict(dict_):
        i = 0
        for _ in dict_.items():
            i += 1
        return i
    
    list_ = range(1000000)
    dict_ = dict(zip(range(1000000), range(1000000)))
    

    Tested with IPython on Python 2.7 (Kubuntu):

    %timeit iterlist(list_)
    10 loops, best of 3: 28.5 ms per loop
    
    %timeit iterdict(dict_)
    10 loops, best of 3: 39.7 ms per loop
    
    %timeit noiterdict(dict_)
    10 loops, best of 3: 86.1 ms per loop
    
    0 讨论(0)
  • 2020-12-03 22:36

    It depends on which version of Python you're using. In Python 2, some_dict.items() creates a new list, which takes up some additional time and uses up additional memory. On the other hand, once the list is created, it's a list, and so should have identical performance characteristics after the overhead of list creation is complete.

    In Python 3, some_dict.items() creates a view object instead of a list, and I anticipate that creating and iterating over items() would be faster than in Python 2, since nothing has to be copied. But I also anticipate that iterating over an already-created view would be a bit slower than iterating over an already-created list, because dictionary data is stored somewhat sparsely, and I believe there's no good way for python to avoid iterating over every bin in the dictionary -- even the empty ones.

    In Python 2, some timings confirm my intuitions:

    >>> some_dict = dict(zip(xrange(1000), reversed(xrange(1000))))
    >>> some_list = zip(xrange(1000), xrange(1000))
    >>> %timeit for t in some_list: t
    10000 loops, best of 3: 25.6 us per loop
    >>> %timeit for t in some_dict.items(): t
    10000 loops, best of 3: 57.3 us per loop
    

    Iterating over the items is roughly twice as slow. Using iteritems is a tad bit faster...

    >>> %timeit for t in some_dict.iteritems(): t
    10000 loops, best of 3: 41.3 us per loop
    

    But iterating over the list itself is basically the same as iterating over any other list:

    >>> some_dict_list = some_dict.items()
    >>> %timeit for t in some_dict_list: t
    10000 loops, best of 3: 26.1 us per loop
    

    Python 3 can create and iterate over items faster than Python 2 can (compare to 57.3 us above):

    >>> some_dict = dict(zip(range(1000), reversed(range(1000))))
    >>> %timeit for t in some_dict.items(): t      
    10000 loops, best of 3: 33.4 us per loop 
    

    But the time to create a view is negligable; it is actually slower to iterate over than a list.

    >>> some_list = list(zip(range(1000), reversed(range(1000))))
    >>> some_dict_view = some_dict.items()
    >>> %timeit for t in some_list: t
    10000 loops, best of 3: 18.6 us per loop
    >>> %timeit for t in some_dict_view: t
    10000 loops, best of 3: 33.3 us per loop
    

    This means that in Python 3, if you want to iterate many times over the items in a dictionary, and performance is critical, you can get a 30% speedup by caching the view as a list.

    >>> some_list = list(some_dict_view)
    >>> %timeit for t in some_list: t
    100000 loops, best of 3: 18.6 us per loop
    
    0 讨论(0)
提交回复
热议问题