Efficient Python array with 100 million zeros?

后端未结

关注

 10  698

挽巷

What is an efficient way to initialize and access elements of a large array in Python?

I want to create an array in Python with 100 million entries, unsigned 4-byte

相关标签:

10条回答

孤街浪徒

2020-12-08 20:40

For fast creation, use the array module.

Using the array module is ~5 times faster for creation, but about twice as slow for accessing elements compared to a normal list:

# Create array
python -m timeit -s "from array import array" "a = array('I', '\x00'
 * 100000000)"
10 loops, best of 3: 204 msec per loop

# Access array
python -m timeit -s "from array import array; a = array('I', '\x00'
* 100000000)" "a[4975563]"
10000000 loops, best of 3: 0.0902 usec per loop

# Create list
python -m timeit "a = [0] * 100000000"
10 loops, best of 3: 949 msec per loop

# Access list
python -m timeit  -s "a = [0] * 100000000" "a[4975563]"
10000000 loops, best of 3: 0.0417 usec per loop

0 讨论(0)

傲寒

2020-12-08 20:43
I have done some profiling, and the results are completely counterintuitive. For simple array access operations, numpy and array.array are 10x slower than native Python arrays.

Note that for array access, I am doing operations of the form:
```
a[i] += 1
```
Profiles:
- [0] * 20000000
  - Access: 2.3M / sec
  - Initialization: 0.8s
- numpy.zeros(shape=(20000000,), dtype=numpy.int32)
  - Access: 160K/sec
  - Initialization: 0.2s
- array.array('L', [0] * 20000000)
  - Access: 175K/sec
  - Initialization: 2.0s
- array.array('L', (0 for i in range(20000000)))
  - Access: 175K/sec, presumably, based upon the profile for the other array.array
  - Initialization: 6.7s
0 讨论(0)
发布评论:

提交评论
- 加载中...
梦谈多话

2020-12-08 20:43

In addition to the other excellent solutions, another way is to use a dict instead of an array (elements which exist are non-zero, otherwise they're zero). Lookup time is O(1).

You might also check if your application is resident in RAM, rather than swapping out. It's only 381 MB, but the system may not be giving you it all for whatever reason.

However there are also some really fast sparse matrices (SciPy and ndsparse). They are done in low-level C, and might also be good.

0 讨论(0)
发布评论:

提交评论
- 加载中...
盖世英雄少女心

2020-12-08 20:45
If
- access speed of array.array is acceptable for your application
- compact storage is most important
- you want to use standard modules (no NumPy dependency)
- you are on platforms that have /dev/zero
the following may be of interest to you. It initialises array.array about 27 times faster than array.array('L', [0]*size):
```
myarray = array.array('L')
f = open('/dev/zero', 'rb')
myarray.fromfile(f, size)
f.close()
```
On How to initialise an integer array.array object with zeros in Python I'm looking for an even better way.
0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2