问题
I wanna make a model with multiple inputs. So, I try to build a model like this.
# define two sets of inputs
inputA = Input(shape=(32,64,1))
inputB = Input(shape=(32,1024))
# CNN
x = layers.Conv2D(32, kernel_size = (3, 3), activation = 'relu')(inputA)
x = layers.Conv2D(32, (3,3), activation='relu')(x)
x = layers.MaxPooling2D(pool_size=(2,2))(x)
x = layers.Dropout(0.2)(x)
x = layers.Flatten()(x)
x = layers.Dense(500, activation = 'relu')(x)
x = layers.Dropout(0.5)(x)
x = layers.Dense(500, activation='relu')(x)
x = Model(inputs=inputA, outputs=x)
# DNN
y = layers.Flatten()(inputB)
y = Dense(64, activation="relu")(y)
y = Dense(250, activation="relu")(y)
y = Dense(500, activation="relu")(y)
y = Model(inputs=inputB, outputs=y)
# Combine the output of the two models
combined = concatenate([x.output, y.output])
# combined outputs
z = Dense(300, activation="relu")(combined)
z = Dense(100, activation="relu")(combined)
z = Dense(1, activation="softmax")(combined)
model = Model(inputs=[x.input, y.input], outputs=z)
model.summary()
opt = Adam(lr=1e-3, decay=1e-3 / 200)
model.compile(loss = 'sparse_categorical_crossentropy', optimizer = opt,
metrics = ['accuracy'])
and the summary : _
But, when i try to train this model,
history = model.fit([trainimage, train_product_embd],train_label,
validation_data=([validimage,valid_product_embd],valid_label), epochs=10,
steps_per_epoch=100, validation_steps=10)
the problem happens.... :
--------------------------------------------------------------------------- ResourceExhaustedError Traceback (most recent call last) in () ----> 1 history = model.fit([trainimage, train_product_embd],train_label, validation_data=([validimage,valid_product_embd],valid_label), epochs=10, steps_per_epoch=100, validation_steps=10)
4 frames /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py in call(self, *args, **kwargs) 1470 ret = tf_session.TF_SessionRunCallable(self._session._session, 1471
self._handle, args, -> 1472 run_metadata_ptr) 1473 if run_metadata: 1474
proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)ResourceExhaustedError: 2 root error(s) found. (0) Resource exhausted: OOM when allocating tensor with shape[800000,32,30,62] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node conv2d_1/convolution}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[metrics/acc/Mean_1/_185]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
(1) Resource exhausted: OOM when allocating tensor with shape[800000,32,30,62] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node conv2d_1/convolution}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.0 successful operations. 0 derived errors ignored.
Thanks for reading and hopefully helping me :)
回答1:
OOM stands for "out of memory". Your GPU is running out of memory, so it can't allocate memory for this tensor. There are a few things you can do:
- Decrease the number of neurons in your
Dense,Conv2Dlayers - Use a smaller
batch_size(or increasesteps_per_epoch) - Use grayscale images (there will be one channel instead of three)
- Reduce the number of layers
- Use more
MaxPooling2Dlayers, and increase their pool size - Use larger
stridesin yourConv2Dlayers - Reduce the size your images (you can use
PILorcv2for that) - Apply dropout
- Use smaller
floatprecision, namelynp.float32if you accidentally usednp.float64 - If you're using a pre-trained model, freeze the first layers
There is more useful information in this error:
OOM when allocating tensor with shape[800000,32,30,62]
This is a weird shape. If you're working with images you should normally have 3 or 1 channels. On top of that, it seems like you are passing your entire dataset at once, you should instead pass it in batches.
回答2:
From [800000,32,30,62] it seems your model put all the data in one batch.
Try specified batch size like
history = model.fit([trainimage, train_product_embd],train_label, validation_data=([validimage,valid_product_embd],valid_label), epochs=10, steps_per_epoch=100, validation_steps=10, batch_size=32)
If it still OOM then try reduce the batch_size
回答3:
Happened to me as well.
You can try reducing trainable parameters by using some form of Transfer Learning - try freezing the initial few layers and use lower batch sizes.
来源:https://stackoverflow.com/questions/59394947/how-to-fix-resourceexhaustederror-oom-when-allocating-tensor