Efficient Python array with 100 million zeros?

后端 未结 10 688
挽巷
挽巷 2020-12-08 19:56

What is an efficient way to initialize and access elements of a large array in Python?

I want to create an array in Python with 100 million entries, unsigned 4-byte

相关标签:
10条回答
  • 2020-12-08 20:40

    For fast creation, use the array module.

    Using the array module is ~5 times faster for creation, but about twice as slow for accessing elements compared to a normal list:

    # Create array
    python -m timeit -s "from array import array" "a = array('I', '\x00'
     * 100000000)"
    10 loops, best of 3: 204 msec per loop
    
    # Access array
    python -m timeit -s "from array import array; a = array('I', '\x00'
    * 100000000)" "a[4975563]"
    10000000 loops, best of 3: 0.0902 usec per loop
    
    # Create list
    python -m timeit "a = [0] * 100000000"
    10 loops, best of 3: 949 msec per loop
    
    # Access list
    python -m timeit  -s "a = [0] * 100000000" "a[4975563]"
    10000000 loops, best of 3: 0.0417 usec per loop
    
    0 讨论(0)
  • 2020-12-08 20:43

    I have done some profiling, and the results are completely counterintuitive. For simple array access operations, numpy and array.array are 10x slower than native Python arrays.

    Note that for array access, I am doing operations of the form:

    a[i] += 1
    

    Profiles:

    • [0] * 20000000

      • Access: 2.3M / sec
      • Initialization: 0.8s
    • numpy.zeros(shape=(20000000,), dtype=numpy.int32)

      • Access: 160K/sec
      • Initialization: 0.2s
    • array.array('L', [0] * 20000000)

      • Access: 175K/sec
      • Initialization: 2.0s
    • array.array('L', (0 for i in range(20000000)))

      • Access: 175K/sec, presumably, based upon the profile for the other array.array
      • Initialization: 6.7s
    0 讨论(0)
  • 2020-12-08 20:43

    In addition to the other excellent solutions, another way is to use a dict instead of an array (elements which exist are non-zero, otherwise they're zero). Lookup time is O(1).

    You might also check if your application is resident in RAM, rather than swapping out. It's only 381 MB, but the system may not be giving you it all for whatever reason.

    However there are also some really fast sparse matrices (SciPy and ndsparse). They are done in low-level C, and might also be good.

    0 讨论(0)
  • 2020-12-08 20:45

    If

    • access speed of array.array is acceptable for your application
    • compact storage is most important
    • you want to use standard modules (no NumPy dependency)
    • you are on platforms that have /dev/zero

    the following may be of interest to you. It initialises array.array about 27 times faster than array.array('L', [0]*size):

    myarray = array.array('L')
    f = open('/dev/zero', 'rb')
    myarray.fromfile(f, size)
    f.close()
    

    On How to initialise an integer array.array object with zeros in Python I'm looking for an even better way.

    0 讨论(0)
提交回复
热议问题