Fastest approach to read thousands of images into one big numpy array

后端未结

关注

 2  1553

無奈伤痛 2020-12-03 03:15

I\'m trying to find the fastest approach to read a bunch of images from a directory into a numpy array. My end goal is to compute statistics such as the max, min, and nth pe

2条回答

臣服心动 (楼主)

2020-12-03 03:35
Part A : Accessing and assigning NumPy arrays

Going by the way elements are stored in row-major order for NumPy arrays, you are doing the right thing when storing those elements along the last axis per iteration. These would occupy contiguous memory locations and as such would be the most efficient for accessing and assigning values into. Thus initializations like np.ndarray((512*25,512), dtype='uint16') or np.ndarray((25,512,512), dtype='uint16') would work the best as also mentioned in the comments.

After compiling those as funcs for testing on timings and feeding in random arrays instead of images -
```
N = 512
n = 25
a = np.random.randint(0,255,(N,N))

def app1():
    imgs = np.empty((N,N,n), dtype='uint16')
    for i in range(n):
        imgs[:,:,i] = a
        # Storing along the first two axes
    return imgs

def app2():
    imgs = np.empty((N*n,N), dtype='uint16')
    for num in range(n):    
        imgs[num*N:(num+1)*N, :] = a
        # Storing along the last axis
    return imgs

def app3():
    imgs = np.empty((n,N,N), dtype='uint16')
    for num in range(n):    
        imgs[num,:,:] = a
        # Storing along the last two axes
    return imgs

def app4():
    imgs = np.empty((N,n,N), dtype='uint16')
    for num in range(n):    
        imgs[:,num,:] = a
        # Storing along the first and last axes
    return imgs
```
Timings -
```
In [45]: %timeit app1()
    ...: %timeit app2()
    ...: %timeit app3()
    ...: %timeit app4()
    ...: 
10 loops, best of 3: 28.2 ms per loop
100 loops, best of 3: 2.04 ms per loop
100 loops, best of 3: 2.02 ms per loop
100 loops, best of 3: 2.36 ms per loop
```
Those timings confirm the performance theory proposed at the start, though I expected the timings for the last setup to have timings in between the ones for app3 and app1, but maybe the effect of going from last to the first axis for accessing and assigning isn't linear. More investigations on this one could be interesting (follow up question here).

To claify schematically, consider that we are storing image arrays, denoted by x (image 1) and o (image 2), we would have :

App1 :
```
[[[x 0]
  [x 0]
  [x 0]
  [x 0]
  [x 0]]

 [[x 0]
  [x 0]
  [x 0]
  [x 0]
  [x 0]]

 [[x 0]
  [x 0]
  [x 0]
  [x 0]
  [x 0]]]
```
Thus, in memory space, it would be : [x,o,x,o,x,o..] following row-major order.

App2 :
```
[[x x x x x]
 [x x x x x]
 [x x x x x]
 [o o o o o]
 [o o o o o]
 [o o o o o]]
```
Thus, in memory space, it would be : [x,x,x,x,x,x...o,o,o,o,o..].

App3 :
```
[[[x x x x x]
  [x x x x x]
  [x x x x x]]

 [[o o o o o]
  [o o o o o]
  [o o o o o]]]
```
Thus, in memory space, it would be same as previous one.

Part B : Reading image from disk as arrays

Now, the part on reading image, I have seen OpenCV's imread to be much faster.

As a test, I downloaded Mona Lisa's image from wiki page and tested performance on image reading -
```
import cv2 # OpenCV

In [521]: %timeit io.imread('monalisa.jpg')
100 loops, best of 3: 3.24 ms per loop

In [522]: %timeit cv2.imread('monalisa.jpg')
100 loops, best of 3: 2.54 ms per loop
```
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...