GPU only being used 1-5% Tensorflow-gpu and Keras

问题

I just installed tensorflow for gpu and am using keras for my CNN. During training my GPU is only used about 5%, but 5 out of 6gb of the vram is being used during the training. Sometimes it glitches, prints 0.000000e+00 in the console and the gpu goes to 100% but then after a few seconds the training slows back down to 5%. My GPU is the Zotac gtx 1060 mini and I am using a Ryzen 5 1600x.

Epoch 1/25
 121/3860 [..............................] - ETA: 31:42 - loss: 3.0575 - acc: 0.0877 - val_loss: 0.0000e+00 - val_acc: 0.0000e+00Epoch 2/25
 121/3860 [..............................] - ETA: 29:48 - loss: 3.0005 - acc: 0.0994 - val_loss: 0.0000e+00 - val_acc: 0.0000e+00Epoch 3/25
  36/3860 [..............................] - ETA: 24:47 - loss: 2.9863 - acc: 0.1024

回答1:

Usually, we want the bottleneck to be on the GPU (hence 100% utilization). If that's not happening, some other part of your code is taking a long time during each batch processing. It's hard to say what is it (specialy because you didn't add any code), but there's a few things you can try:

1. input data

Make sure the input data for your network is always available. Reading images from disk takes a long time, so use multiple workers and the multiprocessing interface:

model.fit(..., use_multiprocessing=True, workers=8)

2. Force the model into the GPU

This is hardly the problem, because /gpu:0 is the default device, but it's worth to make sure you are executing the model in the intended device:

with tf.device('/gpu:0'):
    x = Input(...)
    y = Conv2D(..)
    model = Model(x, y)

2. Check the model's size

If your batch size is large and allowed soft placement, parts of your network (which didn't fit in the GPU's memory) might be placed at the CPU. This considerably slows down the process.

If soft placement is on, try to disable and check if a memory error is thrown:

# make sure soft-placement is off
tf_config = tf.ConfigProto(allow_soft_placement=False)
tf_config.gpu_options.allow_growth = True
s = tf.Session(config=tf_config)
K.set_session(s)

with tf.device(...):
    ...

model.fit(...)

If that's the case, try to reduce the batch size until it fits and give you good GPU usage. Then turn soft placement on again.

回答2:

Some directions you can try.

Double check your input pipeline, make sure it is not the bottleneck for the performance.
In crease your batch number or layer width to make sure the GPU got enough data to consume.
The most effective method is dump the profile json to have a look.

My experience is in most of time the low utilization is because of lack of data for GPU to consume.

Some useful links * https://www.tensorflow.org/guide/performance/datasets * https://towardsdatascience.com/howto-profile-tensorflow-1a49fb18073d

来源：https://stackoverflow.com/questions/47462114/gpu-only-being-used-1-5-tensorflow-gpu-and-keras

标签

python

deep-learning

keras

spyder

tensorflow-gpu