问题
In my tensorflow2.0b program I do get an error like this
ResourceExhaustedError: OOM when allocating tensor with shape[727272703] and type int8 on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:TopKV2]
The error occurs after a number of GPU-based operations within this program have been successfully executed.
I like to release all GPU-memory associated with these past operations in order to avoid the above error. How can I do this in tensorflow-2.0b? How could I check memory usage from within my program?
I was only able to find related information using tf.session() which is not available anymore in tensorflow2.0
回答1:
You might be interested in using this Python 3 Bindings for the NVIDIA Management Library.
I would try something like this:
import nvidia_smi
nvidia_smi.nvmlInit()
handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0)
# card id 0 hardcoded here, there is also a call to get all available card ids, so we could iterate
info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle)
print("Total memory:", info.total)
print("Free memory:", info.free)
print("Used memory:", info.used)
nvidia_smi.nvmlShutdown()
来源:https://stackoverflow.com/questions/57236448/how-can-i-check-release-gpu-memory-in-tensorflow-2-0b