How can I process images with OpenCV in parallel using multiprocessing?

问题

I have a set of images in a folder that I want to preprocess using some OpenCV functions. The function

detectAndaligncrop

takes an image path preprocesses it using OpenCV and returns the utput image. I am able to do it using:

for image_path in files_list:
   cropped_image, _=detectAndaligncrop(im)
   cv2.imwrite("ouput_folder/{}".format(os.path.basename(image_path)),cropped_im*255.)

However this is not working:

jobs=[]
for im_no, im in enumerate(files_list):
    p=multiprocessing.Process(target=saveIm,args=[im])
    jobs.append(p)
    p.start()
for j in jobs:
    j.join()

where saveIm is:

im,lm=detectAndaligncrop(im_path)
        fname="output_path/cropped2/{}".format(os.path.basename(im_path))
        cv2.imwrite(fname,im)

I have verified that it calls the detectAndaligncrop function, but does not process image starting from the line where

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

is called inside detectAndaligncrop, because "before cvtColor" is called for every image, while "after cvtColor" is not:

def detectAndaligncrop(impath):
    image=cv2.imread(impath)
    image_float=np.float32(image)/255.0
    print ("before cvtcolor")
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    print ("after cvtcolor")
    return gray, 1

Also,I tried:

with ThreadPoolExecutor(max_workers=32) as execr:
    res=execr.map(saveIm,files_list)

This works but no faster than simply running a for loop. Is it because of GIL?

回答1:

After a few experiments found the error: Basically, the error is in the method to convert the read image into a grayscale one. If I use :

gray = cv2.imread(impath,0)

instead of

image = cv2.imread(impath)
image_float = np.float32(image)/255.0
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

the code works fine,

Perhaps there is some problem in using cv2.cvtColor in MultiProcessing. Someone can shed light on the reasons. Is it about picklability?

回答2:

I was in need of a multiprocessing approach to pre-process images before feeding them to neural networks. I came across this page called Embarrassingly parallel for loops where mathematical tasks were being run for elements in an array/list in parallel. I wanted to know if this could be extended to images (after all images are nothing but arrays, big 3D arrays!)

I decided to perform the add weighted operation from OpenCV to a collection of images. Using this operation you can apply different weights to two images and add them. It is used for blending images as you can see here

I performed this function with and without joblib for a set of images on my desktop and compared their performances. In the end I have mentioned the number of images and the collective size of these images used.

Code:

import os
import time

#--- Importing the required library ---
from joblib import delayed

#--- Choosing all available image formats of images from my desktop ---
path = r'C:\Users\Jackson\Desktop'
img_formats = ['.png', '.jpg', '.jpeg']

#--- Defining the addWeighted function from OpenCV ---
def weight(im):
    addweighted = cv2.addWeighted(im, 0.7, cv2.GaussianBlur(im, (15, 15), 0), 0.3, 0)
    return addweighted


#--- Using joblib library-----
start_time = time.time()

new_dir = os.path.join(path, 'add_Weighted_4_joblib')
if not os.path.exists(new_dir):
    os.makedirs(new_dir)

def joblib_loop():
    for f in os.listdir(path):
        if any(c in f for c in img_formats):
            img = cv2.imread(os.path.join(path, f))
            r = delayed(weight)(img)
            cv2.imwrite(os.path.join(new_dir, f + '_add_weighted_.jpg'), r)

elapsed_time = time.time() - start_time
print('Using Joblib : ', elapsed_time)

#--- Without joblib ---
start_time = time.time()

#--- Check whether directory exists if not make one
new_dir = os.path.join(path, 'add_Weighted_4')
if not os.path.exists(new_dir):
    os.makedirs(new_dir)

for f in os.listdir(path):
    if any(c in f for c in img_formats):
        img = cv2.imread(os.path.join(path, f))
        r = weight(img)
        cv2.imwrite(os.path.join(new_dir, f + '_add_weighted_.jpg'), r)

elapsed_time = time.time() - start_time
print('Without Joblib : ', elapsed_time)

Here is the result I got:

('Using Joblib : ', 0.09400010108947754)
('Without Joblib : ', 15.386000156402588)

As you can see using joblib speeds up operations like crazy!!

Now let me show you how many images are present on my desktop and what is the total size of them:

overall_size = 0
count = 0
#for f in os.listdir(path):
for  f in os.listdir(path):
    if any(c in f for c in img_formats):
        img = cv2.imread(os.path.join(path, f))
        overall_size+= img.size
        count+= 1

print('Collective Size of all {} images in the predefined path is {} MB'.format(count, overall_size/10**6))

and the result:

Collective size of all 14 images in the predefined path is 58 MB

来源：https://stackoverflow.com/questions/50935330/how-can-i-process-images-with-opencv-in-parallel-using-multiprocessing

标签

python

multithreading

OpenCV

python-multiprocessing