keras image preprocessing unbalanced data

喜夏-厌秋 提交于 2019-12-22 04:04:22

问题


All,

I'm trying to use Keras to do image classification on two classes. For one class, I have very limited number of images, say 500. As for the other class, I have almost infinite number of images. So if I want to use keras image preprocessing, how to do that? Ideally, I need something like this. For class one, I feed 500 images and use ImageDataGenerator to get more images. For class two, each time I extract 500 images in sequence from 1000000 image dataset and probably no data augmentation needed. While looking at the example here and also Keras documentation, I found the training folder contains equal number of images for each class by default. So my question is that is there existing APIs for doing this trick? If so, please kindly point it out to me. If not, is there any workaround to this needs?


回答1:


You have some options.

Option 1

Use the class_weight parameter of the fit() function which is a dictionary mapping classes to a weight value. Lets say you have 500 samples of class 0 and 1500 samples of class 1 than you feed in class_weight = {0:3 , 1:1}. That gives class 0 three times the weight of class 1.

train_generator.classes gives you the proper class names for your weighting.

If you want to calculate this programmatically than you could use scikit-learn´s sklearn.utils.compute_class_weight(): https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/class_weight.py

The function looks at the distribution of labels and produces weights to equally penalize under or over-represented classes in the training set.

See also this useful thread here: https://github.com/fchollet/keras/issues/1875

This thread might also be of help: Is it possible to automatically infer the class_weight from flow_from_directory in Keras?

Option 2

You use a dummy training run with a generator where you apply your image augmentation like rotation, scaling, cropping, flipping etc. and save the augmented images for the real training later. By that you can create a bigger or even balanced dataset for your underrepresented class.

In this dummy run you set save_to_dir in the flow_from_directory function to a folder of your choosing and later on only take the images from the class that you need more samples of. You obviously discard any training results since you only use this run to get more data.



来源:https://stackoverflow.com/questions/44666910/keras-image-preprocessing-unbalanced-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!