I\'m trying to find the fastest approach to read a bunch of images from a directory into a numpy array. My end goal is to compute statistics such as the max, min, and nth pe
In this case, most of the time will be spent reading the files from disk, and I wouldn't worry too much about the time to populate a list.
In any case, here is a script comparing four method, without the overhead of reading an actual image from disk, but just read an object from memory.
import numpy as np
import time
from functools import wraps
x, y = 512, 512
img = np.random.randn(x, y)
n = 1000
def timethis(func):
@wraps(func)
def wrapper(*args, **kwargs):
start = time.perf_counter()
r = func(*args, **kwargs)
end = time.perf_counter()
print('{}.{} : {} milliseconds'.format(func.__module__, func.__name__, (end - start)*1e3))
return r
return wrapper
@timethis
def static_list(n):
imgs = [None]*n
for i in range(n):
imgs[i] = img
return imgs
@timethis
def dynamic_list(n):
imgs = []
for i in range(n):
imgs.append(img)
return imgs
@timethis
def list_comprehension(n):
return [img for i in range(n)]
@timethis
def numpy_flat(n):
imgs = np.ndarray((x*n, y))
for i in range(n):
imgs[x*i:(i+1)*x, :] = img
static_list(n)
dynamic_list(n)
list_comprehension(n)
numpy_flat(n)
The results show:
__main__.static_list : 0.07004200006122119 milliseconds
__main__.dynamic_list : 0.10294799994881032 milliseconds
__main__.list_comprehension : 0.05021800006943522 milliseconds
__main__.numpy_flat : 309.80870099983804 milliseconds
Obviously your best bet is list comprehension, however even with populating a numpy array, its just 310 ms for reading 1000 images (from memory). So again, the overhead will be the disk read.
Why numpy is slower?
It is the way numpy stores array in memory. If we modify the python list functions to convert the list to a numpy array, the times are similar.
The modified functions return values:
@timethis
def static_list(n):
imgs = [None]*n
for i in range(n):
imgs[i] = img
return np.array(imgs)
@timethis
def dynamic_list(n):
imgs = []
for i in range(n):
imgs.append(img)
return np.array(imgs)
@timethis
def list_comprehension(n):
return np.array([img for i in range(n)])
and the timing results:
__main__.static_list : 303.32892100022946 milliseconds
__main__.dynamic_list : 301.86925499992867 milliseconds
__main__.list_comprehension : 300.76925699995627 milliseconds
__main__.numpy_flat : 305.9309459999895 milliseconds
So it is just a numpy thing that it takes more time, and it is constant value relative to array size...