How can I solve 'ran out of gpu memory' in TensorFlow

后端 未结 8 1580
你的背包
你的背包 2020-12-04 10:46

I ran the MNIST demo in TensorFlow with 2 conv layers and a full-conect layer, I got an message that \'ran out of memeory trying to allocate 2.59GiB\' , but it shows that to

相关标签:
8条回答
  • 2020-12-04 11:24

    Before dwelving into other possible explanations like the ones mentioned above, please check that there is no hung process reserving GPU memory. It has just happened to me that my Tensorflow script got hung on some error but I did not notice it because I monitored running processes with nvidia-smi. Now that hung script did not show up in nvidia-smi's output but was still reserving GPU memory. Killing the hung scripts (Tensorflow typically spawns as many as there are GPUs in the system) completely solved a similar problem (after having exhausted all the TF wizardry).

    0 讨论(0)
  • 2020-12-04 11:27

    For Tensorflow 2 or Keras:

    from tensorflow.python.framework.config import set_memory_growth
    tf.compat.v1.disable_v2_behavior()
    gpus = tf.config.experimental.list_physical_devices('GPU')
    if gpus:
        try:
            for gpu in gpus:
                set_memory_growth(gpu, True)
        except RuntimeError as e:
            print(e)
    
    0 讨论(0)
  • 2020-12-04 11:32

    Tensorflow 2

    As we don't have sessions anymore the solution is not longer viable.

    By default, TensorFlow maps nearly all of the GPU memory of all GPUs (subject to CUDA_VISIBLE_DEVICES) visible to the process.
    In some cases, it is desirable for the process to only allocate a subset of the available memory, or to only grow the memory usage as is needed by the process. TensorFlow provides two methods to control this. One of them is using set_memory_growth tf.config.experimental.set_memory_growth

    For a full understanding, I recommend this link: Limiting GPU memory growth

    0 讨论(0)
  • 2020-12-04 11:32

    From TensorFlow guide

    gpus = tf.config.experimental.list_physical_devices('GPU')
    if gpus:
        # Restrict TensorFlow to only allocate 1GB of memory on the first GPU
        try:
            tf.config.experimental.set_virtual_device_configuration(gpus[0],
           [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)])
            logical_gpus = tf.config.experimental.list_logical_devices('GPU')
            print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
        except RuntimeError as e:
            # Virtual devices must be set before GPUs have been initialized
            print(e)
    

    Adjust memory_limit=*value* to something reasonable for your GPU. e.g. with 1070ti accessed from Nvidia docker container and remote screen sessions this was memory_limit=7168 for no further errors. Just need to make sure sessions on GPU cleared occasionally (e.g. Jupyter Kernel restarts).

    0 讨论(0)
  • 2020-12-04 11:33

    It's not about that. first of all you can see how much memory it gets when it runs by monitoring your gpu. for example if you have a nvidia gpu u can check that with watch -n 1 nvidia-smi command. But in most cases if you didn't set the maximum fraction of gpu memory, it allocates almost the whole free memory. your problem is lack of enough memory for your gpu. cnn networks are totally heavy. When you are trying to feed your network DO NOT do it with your whole data. DO this feeding procedure in low batch sizes.

    0 讨论(0)
  • 2020-12-04 11:45

    I was encountering out of memory errors when training a small CNN on a GTX 970. Through somewhat of a fluke, I discovered that telling TensorFlow to allocate memory on the GPU as needed (instead of up front) resolved all my issues. This can be accomplished using the following Python code:

        config = tf.ConfigProto()
        config.gpu_options.allow_growth = True
        sess = tf.Session(config=config)
    

    Previously, TensorFlow would pre-allocate ~90% of GPU memory. For some unknown reason, this would later result in out-of-memory errors even though the model could fit entirely in GPU memory. By using the above code, I no longer have OOM errors.

    Note: If the model is too big to fit in GPU memory, this probably won't help!

    0 讨论(0)
提交回复
热议问题