Why is slice assignment faster than `list.insert`?

血红的双手。 提交于 2019-11-30 17:12:47

Your first test case has to call the method insert on the list a, whereas all the operations in test2 are handled directly in byte code. Note the CALL_FUNCTION in the disassembly of test1 below. Calling functions is moderately expensive in Python: certainly expensive enough to account for a few percent difference in run time.

>>> import dis
>>> dis.dis(test1)
  2           0 LOAD_CONST               1 (1)
              3 LOAD_CONST               2 (2)
              6 LOAD_CONST               3 (3)
              9 BUILD_LIST               3
             12 STORE_FAST               0 (a)

  3          15 LOAD_FAST                0 (a)
             18 LOAD_ATTR                0 (insert)
             21 LOAD_CONST               4 (0)
             24 LOAD_CONST               1 (1)
             27 CALL_FUNCTION            2
             30 POP_TOP             
             31 LOAD_CONST               0 (None)
             34 RETURN_VALUE        
>>> dis.dis(test2)
  2           0 LOAD_CONST               1 (1)
              3 LOAD_CONST               2 (2)
              6 LOAD_CONST               3 (3)
              9 BUILD_LIST               3
             12 STORE_FAST               0 (a)

  3          15 LOAD_CONST               1 (1)
             18 BUILD_LIST               1
             21 LOAD_FAST                0 (a)
             24 LOAD_CONST               4 (0)
             27 LOAD_CONST               4 (0)
             30 STORE_SLICE+3       
             31 LOAD_CONST               0 (None)
             34 RETURN_VALUE        

Bad explanation

I posted this first, but after consideration I think it is not correct. The difference I describe here should only make a noticeable difference when there is a lot of data to be moved, which isn't the case in the test here. And even with a lot of data the difference is only a couple of percent:

import timeit

def test1():
    a = range(10000000)
    a.insert(1,1)

def test2():
    a = range(10000000)
    a[1:1]=[1]

>>> timeit.timeit(test1, number=10)
6.008707046508789
>>> timeit.timeit(test2, number=10)
5.861173868179321

The method list.insert is implemented by the function ins1 in listobject.c. You'll see that it copies the item references for the tail of the list one by one:

for (i = n; --i >= where; )
    items[i+1] = items[i];

On the other hand slice assignment is implemented by the function list_ass_slice, which calls memmove:

memmove(&item[ihigh+d], &item[ihigh],
        (k - ihigh)*sizeof(PyObject *));

So I think the answer to your question is that the C library function memmove is better optimized than the simple loop. See here for the glibc implementation of memmove: I believe that when called from list_ass_slice it eventually ends up calling _wordcopy_bwd_aligned which you can see is heavily hand-optimized.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!