Keras flowFromDirectory get file names as they are being generated

问题

Is it possible to get the file names that were loaded using flow_from_directory ? I have :

datagen = ImageDataGenerator(
    rotation_range=3,
#     featurewise_std_normalization=True,
    fill_mode='nearest',
    width_shift_range=0.2,
    height_shift_range=0.2,
    horizontal_flip=True
)

train_generator = datagen.flow_from_directory(
        path+'/train',
        target_size=(224, 224),
        batch_size=batch_size,)

I have a custom generator for my multi output model like:

a = np.arange(8).reshape(2, 4)
# print(a)

print(train_generator.filenames)

def generate():
    while 1:
        x,y = train_generator.next()
        yield [x] ,[a,y]

Node that at the moment I am generating random numbers for a but for real training , I wish to load up a json file that contains the bounding box coordinates for my images. For that I will need to get the file names that were generated using train_generator.next() method. After I have that , I can load the file, parse the json and pass it instead of a. It is also necessary that the ordering of the x variable and the list of the file names that I get is the same.

回答1:

Yes is it possible, at least with version 2.0.4 (don't know about earlier version).

The instance of ImageDataGenerator().flow_from_directory(...) has an attribute with filenames which is a list of all the files in the order the generator yields them and also an attribute batch_index. So you can do it like this:

datagen = ImageDataGenerator()
gen = datagen.flow_from_directory(...)

And every iteration on generator you can get the corresponding filenames like this:

for i in gen:
    idx = (gen.batch_index - 1) * gen.batch_size
    print(gen.filenames[idx : idx + gen.batch_size])

This will give you the filenames of the images in the current batch.

回答2:

You can make a pretty minimal subclass that returns the image, file_path tuple by inheriting the DirectoryIterator:

import numpy as np
from keras.preprocessing.image import ImageDataGenerator, DirectoryIterator

class ImageWithNames(DirectoryIterator):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.filenames_np = np.array(self.filepaths)
        self.class_mode = None # so that we only get the images back

    def _get_batches_of_transformed_samples(self, index_array):
        return (super()._get_batches_of_transformed_samples(index_array),
                self.filenames_np[index_array])

In the init, I added a attribute that is the numpy version of self.filepaths so that we can easily index into that array to get the paths on each batch generation.

The only other change to the base class is to return a tuple that is the image batch super()._get_batches_of_transformed_samples(index_array) and the file paths self.filenames_np[index_array].

With that, you can make your generator like so:

imagegen = ImageDataGenerator()
datagen = ImageWithNames('/data/path', imagegen, target_size=(224,224))

And then check with

next(datagen)

回答3:

Here is an example that works with shuffle=True as well. And also properly handles last batch. To make one pass:

datagen = ImageDataGenerator().flow_from_directory(...)    
batches_per_epoch = datagen.samples // datagen.batch_size + (datagen.samples % datagen.batch_size > 0)
for i in range(batches_per_epoch):
    batch = next(datagen)
    current_index = ((datagen.batch_index-1) * datagen.batch_size)
    if current_index < 0:
        if datagen.samples % datagen.batch_size > 0:
            current_index = max(0,datagen.samples - datagen.samples % datagen.batch_size)
        else:
            current_index = max(0,datagen.samples - datagen.batch_size)
    index_array = datagen.index_array[current_index:current_index + datagen.batch_size].tolist()
    img_paths = [datagen.filepaths[idx] for idx in index_array]
    #batch[0] - x, batch[1] - y, img_paths - absolute path

回答4:

at least with version 2.2.4,you can do it like this

datagen = ImageDataGenerator()
gen = datagen.flow_from_directory(...)
for file in gen.filenames:
    print(file)

or get the file path

for filepath in gen.filepaths:
    print(filepath)

来源：https://stackoverflow.com/questions/41715025/keras-flowfromdirectory-get-file-names-as-they-are-being-generated

标签

python

machine-learning

neural-network

keras