I extract pages images from a PDF file in jpeg format and I need to determine if each image is much more grayscale, color ou black and white (with a tolerance factor).
I have found a way to guess this with the PIL ImageStat module. Thanx to this post for the monochromatic determination of an image.
from PIL import Image, ImageStat
MONOCHROMATIC_MAX_VARIANCE = 0.005
COLOR = 1000
MAYBE_COLOR = 100
def detect_color_image(file):
v = ImageStat.Stat(Image.open(file)).var
is_monochromatic = reduce(lambda x, y: x and y < MONOCHROMATIC_MAX_VARIANCE, v, True)
print file, '-->\t',
if is_monochromatic:
print "Monochromatic image",
else:
if len(v)==3:
maxmin = abs(max(v) - min(v))
if maxmin > COLOR:
print "Color\t\t\t",
elif maxmin > MAYBE_COLOR:
print "Maybe color\t",
else:
print "grayscale\t\t",
print "(",maxmin,")"
elif len(v)==1:
print "Black and white"
else:
print "Don't know..."
The COLOR and MAYBE_COLOR constant are quick switches to find the differences between color and grayscale images but it is not safe. As an exemple, I have several JPEG images that are view as color but in real are grayscale with some color artefacts due to a scan process. That's why I have another level to note really shure color image from the others.
If someone has a better approch, let me know.