How can I flush GPU memory using CUDA (physical reset is unavailable)

后端 未结 7 803
不知归路
不知归路 2020-12-13 01:39

My CUDA program crashed during execution, before memory was flushed. As a result, device memory remained occupied.

I\'m running on a GTX 580, for which nvidia-

相关标签:
7条回答
  • 2020-12-13 02:14

    check what is using your GPU memory with

    sudo fuser -v /dev/nvidia*
    

    Your output will look something like this:

                         USER        PID  ACCESS COMMAND
    /dev/nvidia0:        root       1256  F...m  Xorg
                         username   2057  F...m  compiz
                         username   2759  F...m  chrome
                         username   2777  F...m  chrome
                         username   20450 F...m  python
                         username   20699 F...m  python
    

    Then kill the PID that you no longer need on htop or with

    sudo kill -9 PID.
    

    In the example above, Pycharm was eating a lot of memory so I killed 20450 and 20699.

    0 讨论(0)
  • 2020-12-13 02:15

    I also had the same problem, and I saw a good solution in quora, using

    sudo kill -9 PID.
    

    see https://www.quora.com/How-do-I-kill-all-the-computer-processes-shown-in-nvidia-smi

    0 讨论(0)
  • 2020-12-13 02:18

    on macOS (/ OS X), if someone else is having trouble with the OS apparently leaking memory:

    • https://github.com/phvu/cuda-smi is useful for quickly checking free memory
    • Quitting applications seems to free the memory they use. Quit everything you don't need, or quit applications one-by-one to see how much memory they used.
    • If that doesn't cut it (quitting about 10 applications freed about 500MB / 15% for me), the biggest consumer by far is WindowServer. You can Force quit it, which will also kill all applications you have running and log you out. But it's a bit faster than a restart and got me back to 90% free memory on the cuda device.
    0 讨论(0)
  • 2020-12-13 02:18

    for the ones using python:

    import torch, gc
    gc.collect()
    torch.cuda.empty_cache()
    
    0 讨论(0)
  • 2020-12-13 02:26

    Although it should be unecessary to do this in anything other than exceptional circumstances, the recommended way to do this on linux hosts is to unload the nvidia driver by doing

    $ rmmod nvidia 
    

    with suitable root privileges and then reloading it with

    $ modprobe nvidia
    

    If the machine is running X11, you will need to stop this manually beforehand, and restart it afterwards. The driver intialisation processes should eliminate any prior state on the device.

    This answer has been assembled from comments and posted as a community wiki to get this question off the unanswered list for the CUDA tag

    0 讨论(0)
  • 2020-12-13 02:26

    First type

    nvidia-smi
    

    then select the PID that you want to kill

    sudo kill -9 PID
    
    0 讨论(0)
提交回复
热议问题