Stacks / list in python - how does it append?

£可爱£侵袭症+ 提交于 2021-02-08 10:46:59

问题


If I have a list:

list_1 = ["apples", "apricots", "oranges"]

and I append an new item to the list : "berries"

list_1 = ["apples", "apricots", "oranges", "berries"]

Under-the-hood (so to speak), I thought I remember reading that Python creates another list (list_2) and points it to the original list (list_1) so that list_1 remains static...if this is true, would it look something like this (under-the-hood)?

list_1 = ["apples", "apricots", ["oranges", "berries"]]

So in this way, the original list maintains its size. Is this the correct way of looking at it?


回答1:


No, Python does not create another list when you call append. It mutates the existing list in-place. You can see this pretty easily:

>>> lst1 = []
>>> lst2 = lst1
>>> lst1.append(0)
>>> lst1
[0]
>>> lst2
[0]

If you want to create another list, you can do this instead:

>>> lst1 = []
>>> lst2 = lst1
>>> lst1 = lst1 + [0]
>>> lst1
[0]
>>> lst2
[]

So, how does that in-place appending work? Aren't lists just arrays under the hood? Yes, they are. Python leaves a little space at the end, but if you append enough times, it has to allocate a new array for the list, move over all the elements, and delete the old one. It's still the same list object, but with a different array under the hood.

That growing doesn't just add one new slot each time—that would mean each append has to reallocate the whole list, so appending would take average linear time. Instead, it multiplies the length. Something like this:

new_capacity = max(4, capacity * 8 // 5, new_length)

(The new_length is there in case you're extending the list with a whole bunch of elements at once.)

By expanding geometrically rather than arithmetically, we can guarantee that, while a few appends do take linear time, enough of them are instant that the amortized time is constant. Exactly what factor you use is a tradeoff between speed (high numbers mean fewer reallocations) and space (higher numbers mean more wasted space on the end). I don't know what CPython does, but you can find it in the source code linked below. Most systems use a value between 1.5 and 2.0 (and usually a nice fraction of small numbers so they can do integer multiple and divide).


If you really want to understand this, and you can follow basic C, you can look under the hood at listobject.h and listobject.c. You'll probably want to read the C API docs first, but here's the basics (in Python-like pseudocode, and intentionally using not quite the real function and field names):

if lst.size + 1 > lst.allocated:
    new_capacity = <see above>
    lst.array = PyRealloc(<enough memory for new_capacity pointers>)
    lst.allocated = new_capacity
incref(new_item)
lst.array[lst.size] = new_item
lst.size += 1

The Realloc function is going to be a thin wrapper around the platform's function, which will try to find more room in-place, but fall back to allocating a totally new pointer and moving over all of the contents.


Since you're using Python, there's a good chance you're the kind of person who likes to learn through interactive experimentation. If you don't know about ctypes.pythonapi. you should definitely start playing with it. You can call almost anything from the C API from inside Python. Unfortunately, you can't call #define macros, or dig into the structs without a bit of extra work—but see superhackyinternals for how you can do that bit of extra work. (I don't think I included anything there for lists, but look at how ints work, and you should be able to get it from there—just don't look at strings, because they're a lot more complicated.) Of course playing around with this stuff from inside your interpreter, you're going to segfault a lot, so don't do it in a session where you've got any important history.


And of course that isn't guaranteed to be true for every Python implementation. As long as an implementation can provide the documented interface and performance characteristics, it can build lists however it wants. For example, maybe IronPython uses some vector class in the .NET class library. Of course that class will do similar reallocate-and-move under its own hood, but IronPython won't care how it does that (and you'll care even less).




回答2:


Under the hood, a Python list object uses a C array structure that's larger; it is pre-sized. The length of the Python list is just an integer value, recording how many Python elements are stored in the array. Appending an element to the list just uses the next empty spot in the array, and the size integer is incremented by one.

When there is not enough space any more in the C array, more memory is allocated to grow the array. If you remove elements to the point you only use half of the array, memory is released again.

You can see the implementation in the Objects/listobject.c file in the Python source code. Resizing takes place in the list_resize() function, where the following snippet decides how large the new array should be, to strike a balance between memory usage (with a bunch of pointers in an array not being used) and avoiding having to copy across arrays too often:

/* This over-allocates proportional to the list size, making room
 * for additional growth.  The over-allocation is mild, but is
 * enough to give linear-time amortized behavior over a long
 * sequence of appends() in the presence of a poorly-performing
 * system realloc().
 * The growth pattern is:  0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
 */
new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);

new_allocated is added to the current allocation. So when you need more space, then the new size, divided by 8, plus 3 or 6, dictates how many extra elements to add over on top of the minimal required size. Appending an element to a list of size 1000 adds a buffer of 131 extra slots, while appending an element to a list size 10 only adds an extra 7 slots.

From the view of Python code, the list is just a sequence of indices that'll grow and shrink as needed to fit all the elements. There are no extra lists involved in this, the swapping of arrays when resizing is hidden from view.




回答3:


No, under the hood the list is backed by a (usually) underutilized array.

list1 -> [ x | x |  ]
           |   |
           |   v
           |   "apricots"
           v
           "apples"

When you append to list1, you simply change the value of the first unused array slot:

list1 -> [ x | x | x ]
           |   |   |
           |   |   v
           |   |   "oranges" 
           |   v   
           |   "apricots"
           v
           "apples"

On the next append, more memory (and again, more than is needed) is added to the array before adding the new element. [The extra memory may be allocated as soon as the array is detected to be full; I don't recall the exact details.]

list1 -> [ x | x | x |  |  |  |  ]
           |   |   |
           |   |   v
           |   |   "oranges" 
           |   v   
           |   "apricots"
           v
           "apples"

list1 -> [ x | x | x | x |  |  |  ]
           |   |   |   |
           |   |   |   v
           |   |   |   "berries"
           |   |   v
           |   |   "oranges" 
           |   v   
           |   "apricots"
           v
           "apples"

The amount actually allocated may vary, but the desired effect is that any sequence of appends has the appearance of a constant-time operation, even though each individual append may be either a very small constant-time operation or a linear-time operation. The invariant, though, is that you can never have "too many" linear-time operations over the life of the object, preserving the amortized running time of each append.




回答4:


A Python implementation can do anything under the hood, provided it has the correct behavior. Good implementations also are as least as fast as the recommended time complexities.

In general, appending to a list modifies the list, if possible. In its append implementation, the widely used cpython resizes the list if necessary to 9/8 * old_size + 6 if there is no more space. Resizing is accomplished by either reserving more memory (if lucky) or allocating new memory and copying over all old elements. This means that resizing is only rarely needed, especially if the list is large. Most of the time, one of the reserve memory spaces can be used.



来源:https://stackoverflow.com/questions/49244003/stacks-list-in-python-how-does-it-append

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!