Keras: Out of memory when doing hyper parameter grid search

问题

I'm running multiple nested loops to do hyper parameter grid search. Each nested loop runs through a list of hyper parameter values and inside the innermost loop, a Keras sequential model is built and evaluated each time using a generator. (I'm not doing any training, I'm just randomly initializing and then evaluating the model multiple times and then retrieving the average loss).

My problem is that during this process, Keras seems to be filling up my GPU memory, so that I eventually get an OOM error.

Does anybody know how to solve this and free up the GPU memory each time after a model is evaluated?

I do not need the model anymore at all after it has been evaluated, I can throw it away entirely every time before building a new one in the next pass of the inner loop.

I'm using the Tensorflow backend.

Here is the code, although much of it isn't relevant to the general problem. The model is built inside the fourth loop,

for fsize in fsizes:

I guess the details of how the model is built don't matter much, but here is all of it anyway:

model_losses = []
model_names = []

for activation in activations:
    for i in range(len(layer_structures)):
        for width in layer_widths[i]:
            for fsize in fsizes:

                model_name = "test_{}_struc-{}_width-{}_fsize-{}".format(activation,i,np.array_str(np.array(width)),fsize)
                model_names.append(model_name)
                print("Testing new model: ", model_name)

                #Structure for this network
                structure = layer_structures[i]

                row, col, ch = 80, 160, 3  # Input image format

                model = Sequential()

                model.add(Lambda(lambda x: x/127.5 - 1.,
                          input_shape=(row, col, ch),
                          output_shape=(row, col, ch)))

                for j in range(len(structure)):
                    if structure[j] == 'conv':
                        model.add(Convolution2D(width[j], fsize, fsize))
                        model.add(BatchNormalization(axis=3, momentum=0.99))
                        if activation == 'relu':
                            model.add(Activation('relu'))
                        if activation == 'elu':
                            model.add(ELU())
                            model.add(MaxPooling2D())
                    elif structure[j] == 'dense':
                        if structure[j-1] == 'dense':
                            model.add(Dense(width[j]))
                            model.add(BatchNormalization(axis=1, momentum=0.99))
                            if activation == 'relu':
                                model.add(Activation('relu'))
                            elif activation == 'elu':
                                model.add(ELU())
                        else:
                            model.add(Flatten())
                            model.add(Dense(width[j]))
                            model.add(BatchNormalization(axis=1, momentum=0.99))
                            if activation == 'relu':
                                model.add(Activation('relu'))
                            elif activation == 'elu':
                                model.add(ELU())

                model.add(Dense(1))

                average_loss = 0
                for k in range(5):
                    model.compile(optimizer="adam", loss="mse")
                    val_generator = generate_batch(X_val, y_val, resize=(160,80))
                    loss = model.evaluate_generator(val_generator, len(y_val))
                    average_loss += loss

                average_loss /= 5

                model_losses.append(average_loss)

                print("Average loss after 5 initializations: {:.3f}".format(average_loss))
                print()

回答1:

As indicated, the backend being used is Tensorflow. With the Tensorflow backend the current model is not destroyed, so you need to clear the session.

After the usage of the model just put:

if K.backend() == 'tensorflow':
    K.clear_session()

Include the backend:

from keras import backend as K

Also you can use sklearn wrapper to do grid search. Check this example: here. Also for more advanced hyperparameter search you can use hyperas.

回答2:

Using the tip given by indraforyou, I added the code to clear the TensorFlow session inside the function I pass to GridSearchCV, like this:

def create_model():
    # cleanup
    K.clear_session()

    inputs = Input(shape=(4096,))
    x = Dense(2048, activation='relu')(inputs)
    p = Dense(2, activation='sigmoid')(x)
    model = Model(input=inputs, outputs=p)
    model.compile(optimizer='SGD',
              loss='mse',
              metrics=['accuracy'])
    return model

And then I can invoke the grid search:

model = KerasClassifier(build_fn=create_model)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)

It should work.

Cheers!

来源：https://stackoverflow.com/questions/42047497/keras-out-of-memory-when-doing-hyper-parameter-grid-search

标签

python

memory-management

out-of-memory

keras

grid-search