Efficient Python array with 100 million zeros?

后端 未结 10 710
挽巷
挽巷 2020-12-08 19:56

What is an efficient way to initialize and access elements of a large array in Python?

I want to create an array in Python with 100 million entries, unsigned 4-byte

10条回答
  •  我在风中等你
    2020-12-08 20:26

    Just a reminder how Python's integers work: if you allocate a list by saying

    a = [0] * K
    

    you need the memory for the list (sizeof(PyListObject) + K * sizeof(PyObject*)) and the memory for the single integer object 0. As long as the numbers in the list stay below the magic number V that Python uses for caching, you are fine because those are shared, i.e. any name that points to a number n < V points to the exact same object. You can find this value by using the following snippet:

    >>> i = 0
    >>> j = 0
    >>> while i is j:
    ...    i += 1
    ...    j += 1
    >>> i # on my system!
    257 
    

    This means that as soon as the counts go above this number, the memory you need is sizeof(PyListObject) + K * sizeof(PyObject*) + d * sizeof(PyIntObject), where d < K is the number of integers above V (== 256). On a 64 bit system, sizeof(PyIntObject) == 24 and sizeof(PyObject*) == 8, i.e. the worst case memory consumption is 3,200,000,000 bytes.

    With numpy.ndarray or array.array, memory consumption is constant after initialization, but you pay for the wrapper objects that are created transparently, as Thomas Wouters said. Probably, you should think about converting the update code (which accesses and increases the positions in the array) to C code, either by using Cython or scipy.weave.

提交回复
热议问题