Keras ImageDataGenerator for Cloud ML Engine

问题

I need to train a neural net fed by some raw images that I store on the GCloud Storage. To do that I’m using the flow_from_directory method of my Keras image generator to find all the images and their related labels on the storage.

training_data_directory = args.train_dir
testing_data_directory = args.eval_dir

training_gen = datagenerator.flow_from_directory(
                    training_data_directory,
                    target_size = (img_width, img_height),
                    batch_size = 32)

validation_gen = basic_datagen.flow_from_directory(
                    testing_data_directory,
                    target_size = (img_width, img_height),
                    batch_size = 32)

My GCloud Storage architecture is the following :

brad-bucket / data / train
brad-bucket / data / eval

The gsutil command allows me to be sure my folders exist.

brad$ gsutil ls gs://brad-bucket/data/
gs://brad-bucket/data/eval/
gs://brad-bucket/data/train/

So here is the script I'm running to launch the training on ML Engine with the strings I use for the paths of my directories (train_dir, eval_dir).

BUCKET="gs://brad-bucket"
JOB_ID="training_"$(date +%s)
JOB_DIR="gs://brad-bucket/jobs/train_keras_"$(date +%s)
TRAIN_DIR="gs://brad-bucket/data/train/"
EVAL_DIR="gs://brad-bucket/data/eval/"
CONFIG_PATH="config/config.yaml"
PACKAGE="trainer"

gcloud ml-engine jobs submit training $JOB_ID \
                                    --stream-logs \
                                    --verbosity debug \
                                    --module-name trainer.task \
                                    --staging-bucket $BUCKET \
                                    --package-path $PACKAGE \
                                    --config $CONFIG_PATH \
                                    --region europe-west1 \
                                    -- \
                                    --job_dir $JOB_DIR \
                                    --train_dir $TRAIN_DIR \
                                    --eval_dir $EVAL_DIR \
                                    --dropout_one 0.2 \
                                    --dropout_two 0.2

Though, what I’m doing throws an OSError.

ERROR   2018-01-10 09:41:47 +0100   service       File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/_impl/keras/preprocessing/image.py", line 1086, in __init__
ERROR   2018-01-10 09:41:47 +0100   service         for subdir in sorted(os.listdir(directory)):
ERROR   2018-01-10 09:41:47 +0100   service     OSError: [Errno 2] No such file or directory: 'gs://brad-bucket/data/train/'

When I'm using another data structure (reading the data in another way), everything is working fine, but when I'm using flow_from_directory to read from directories and subdirectories I'm always getting this same error. Is it possible to use this method to retrieve data from the Cloud Storage or do I have to feed the data in a different way?

回答1:

If you check the source code, you see that the error arises when Keras (or TF) is trying to construct the classes from your directories. Since you are giving it a GCS-directory (gs://), this will not work. You can bypass this error by providing the classes argument yourself, e.g. in the following way:

def get_classes(file_dir):
    if not file_dir.startswith("gs://"):
      classes = [c.replace('/', '') for c in os.listdir(file_dir)]
    else:
      bucket_name = file_dir.replace('gs://', '').split('/')[0]
      prefix = file_dir.replace("gs://"+bucket_name+'/', '')
      if not prefix.endswith("/"):
          prefix += "/"

      client = storage.Client()
      bucket = client.get_bucket(bucket_name)

      iterator = bucket.list_blobs(delimiter="/", prefix=prefix)
      response = iterator.get_next_page_response()
      classes = [c.replace('/','') for c in response['prefixes']]

    return classes

Passing these classes to flow_from_directory will solve your error, but it will not recognize the files itself (I now get e.g. Found 0 images belonging to 2 classes.).

The only 'direct' workaround that I find, is to copy your files to local disk and read them from there. It would be great to have another solution (since e.g. in case of images, it can take long to copy).

Other resources also suggest to use TensorFlow's file_io function when interacting with GCS from Cloud ML Engine, but this will require you to fully rewrite flow_from_directory yourself in this case.

回答2:

In addition to dumkar's solution. One can try to work with a h5 dataset using Tensorflow's file_io.

with file_io.FileIO(os.path.join(data_dir, data_file_name), mode='r') as input_f:
        with file_io.FileIO('dataset.hdf5', mode='w+') as output_f:
                output_f.write(input_f.read())
dataset = h5py.File('dataset.hdf5', 'r')

This allows you to have a temporary local copy of a file stored in a GC bucket. Here is a good gist by aloisg that demonstrates how you can create the h5 file from your image dataset : https://gist.github.com/aloisg/ac83160edf8a543b5ee6.

You can now retrieve X_train, y_train, X_eval and y_eval from the dataset to feed the keras model easily.

回答3:

It is hard to help you as your current post is. However, checking the error you get we can see it is being thrown by os.listdir(), so it is not a Keras problem per se.

This is probably due to your directory not being absolute path or well that it does not exist (maybe a typo or similar). If you update your question with more information I can help you go deeper into this.

来源：https://stackoverflow.com/questions/48174145/keras-imagedatagenerator-for-cloud-ml-engine

标签

tensorflow

neural-network

keras

gcloud