python list concatenation efficiency

前端 未结 6 1377
陌清茗
陌清茗 2020-12-16 13:07

What is the most efficient way to concatenate two lists list_a and list_b when:

  • list_b items have to be placed before
相关标签:
6条回答
  • 2020-12-16 13:21

    You can assign list_b to a slice, which happens to be empty but at the start of list_a:

    list_a[0:0] = list_b
    

    This is the fastest way to insert a list into another list, at any position.

    0 讨论(0)
  • 2020-12-16 13:22

    try this:

    list_a[0:0] = list_b
    
    0 讨论(0)
  • 2020-12-16 13:24

    Why not just timeit?

    import timeit
    
    create_data = """\
    list_a = range(10)
    list_b = range(10)
    """
    
    t1 = timeit.Timer(stmt=create_data + """\
    list_a = list_b + list_a
    """)
    
    t2 = timeit.Timer(create_data + """\
    for item in list_b:
        list_a.insert(0, item)
    """)
    
    t3 = timeit.Timer(create_data + """\
    for item in list_a:
        list_b.append(item)
    list_a = list_b
    """)
    
    t4 = timeit.Timer(create_data + """\
    list_a[0:0] = list_b
    """)
    
    for i, t in enumerate([t1,t2,t3,t4]):
        print i, "%.2f usec/pass" % (1000000 * t.timeit(number=100000)/100000)
    

    Result:

    0 0.73 usec/pass
    1 2.79 usec/pass
    2 1.66 usec/pass
    3 0.77 usec/pass

    0 讨论(0)
  • 2020-12-16 13:28

    itertools.chain just makes a generator, so if you can get away with using a generator instead of a list, it's constant time to generate but you pay the cost when you access each element. Otherwise list_a[0:0] = list_b is about 6 times faster than list_a = list_b + list_a

    I think that list_a = list_b + list_a is the most readable choice and it's already pretty fast.

    The two methods that you mentioned that use append() in a for loop are unusably slow so I didn't bother including them.


    Ran with Python 3.7.5 [Clang 11.0.0 (clang-1100.0.33.8)] on darwin on a 1.6 GHz Dual-Core Intel Core i5 with 16 GB of 2133 MHz LPDDR3 RAM using the following code:

    from timeit import timeit
    import random
    import matplotlib.pyplot as plt
    
    num_data_points = 1000
    step = 10
    methods = [
        # ordered from slowest to fastest to make the key easier to read
        # """for item in list_a: list_b.append(item); list_a = list_b""",
        # """for item in list_b: list_a.insert(0, item)""",
        # "list_a = list(itertools.chain(list_b, list_a))",
        "list_a = list_b + list_a",
        "list_a[0:0] = list_b",
        "list_a = itertools.chain(list_b, list_a)",
    ]
    
    x = list(range(0, num_data_points * step, step))
    y = [[] for _ in methods]
    for i in x:
        list_a = list(range(i))
        list_b = list(range(i))
        random.shuffle(list_a)
        random.shuffle(list_b)
        setup = f"list_a = {list_a}; list_b = {list_b}"
        for method_index, method in enumerate(methods):
            y[method_index].append(timeit(method, setup=setup, number=30))
        print(i, "out of", num_data_points * step)
    
    ax = plt.axes()
    for method_index, method in enumerate(methods):
        ax.plot(x, y[method_index], label=method)
    ax.set(xlabel="number of elements in both lists", ylabel="time (s) (lower is better)")
    ax.legend()
    plt.show()
    
    0 讨论(0)
  • 2020-12-16 13:31

    Given that

    list_a = list_b + list_a
    

    works for your purposes, it follows that you don't actually need the list_a object itself to store all the data in list_a - you just need it called list_a (ie, you don't have, or don't care about, any other variables you have floating around that might refer to that same list).

    If you also happen not to care about it being exactly a list, but only about it being iterable, then you can use itertools.chain:

    list_a = itertools.chain(list_b, list_a)
    

    If you do care about some list things, you could construct a similar type of thing to chain that behaves like a list - something like:

    class ListChain(list):
        def __init__(self, *lists):
            self._lists = lists
    
        def __iter__(self):
            return itertools.chain.from_iterable(self._lists)
    
        def __len__(self):
            return sum(len(l) for l in self._lists)
    
        def append(self, item):
            self._lists[-1].append(item)
    
        def extend(self, iterable):
            self._lists.append(list(iterable))
    
        def __getitem__(self, item):
           for l in self._lists:
               if item < len(l):
                  return l[item]
               item -= len(l)
           else:
              raise IndexError
    

    etc. This would take a lot of effort (possibly more than its worth) for this to work in all cases - eg, handling slices and negative indexes comes to mind. But for very simple cases, this approach can avoid a lot of copying list contents around.

    0 讨论(0)
  • 2020-12-16 13:40

    Here's a graph of how the timings used in the answer of BigYellowCactus develop as the length of the lists increase. The vertical axis is the time required to initialize both lists and insert one in front of the other, in usec. The horizontal axis is the number of items in the lists.

    Asymptotic behaviour of the possibilities

    t1:

    list_a = list_b + list_a
    

    t2:

    for item in list_b:
        list_a.insert(0, item)
    

    t3:

    for item in list_a:
        list_b.append(item)
    list_a = list_b
    

    t4:

    list_a[0:0] = list_b
    
    0 讨论(0)
提交回复
热议问题