What is an efficient way to initialize and access elements of a large array in Python?
I want to create an array in Python with 100 million entries, unsigned 4-byte
For fast creation, use the array module.
Using the array module is ~5 times faster for creation, but about twice as slow for accessing elements compared to a normal list:
# Create array
python -m timeit -s "from array import array" "a = array('I', '\x00'
* 100000000)"
10 loops, best of 3: 204 msec per loop
# Access array
python -m timeit -s "from array import array; a = array('I', '\x00'
* 100000000)" "a[4975563]"
10000000 loops, best of 3: 0.0902 usec per loop
# Create list
python -m timeit "a = [0] * 100000000"
10 loops, best of 3: 949 msec per loop
# Access list
python -m timeit -s "a = [0] * 100000000" "a[4975563]"
10000000 loops, best of 3: 0.0417 usec per loop
I have done some profiling, and the results are completely counterintuitive. For simple array access operations, numpy and array.array are 10x slower than native Python arrays.
Note that for array access, I am doing operations of the form:
a[i] += 1
Profiles:
[0] * 20000000
numpy.zeros(shape=(20000000,), dtype=numpy.int32)
array.array('L', [0] * 20000000)
array.array('L', (0 for i in range(20000000)))
In addition to the other excellent solutions, another way is to use a dict instead of an array (elements which exist are non-zero, otherwise they're zero). Lookup time is O(1).
You might also check if your application is resident in RAM, rather than swapping out. It's only 381 MB, but the system may not be giving you it all for whatever reason.
However there are also some really fast sparse matrices (SciPy and ndsparse). They are done in low-level C, and might also be good.
If
the following may be of interest to you. It initialises array.array about 27 times faster than array.array('L', [0]*size):
myarray = array.array('L')
f = open('/dev/zero', 'rb')
myarray.fromfile(f, size)
f.close()
On How to initialise an integer array.array object with zeros in Python I'm looking for an even better way.