How to fix this strange error: “RuntimeError: CUDA error: out of memory”

后端 未结 6 746
时光取名叫无心
时光取名叫无心 2021-02-12 22:59

I ran a code about the deep learning network,first I trained the network,and it works well,but this error occurs when running to the validate network.

I have five epoch,

6条回答
  •  独厮守ぢ
    2021-02-12 23:49

    1.. When you only perform validation not training,
    you don't need to calculate gradients for forward and backward phase.
    In that situation, your code can be located under

    with torch.no_grad():
        ...
        net=Net()
        pred_for_validation=net(input)
        ...
    

    Above code doesn't use GPU memory

    2.. If you use += operator in your code,
    it can accumulate gradient continuously in your gradient graph.
    In that case, you need to use float() like following site
    https://pytorch.org/docs/stable/notes/faq.html#my-model-reports-cuda-runtime-error-2-out-of-memory

    Even if docs guides with float(), in case of me, item() also worked like

    entire_loss=0.0
    for i in range(100):
        one_loss=loss_function(prediction,label)
        entire_loss+=one_loss.item()
    

    3.. If you use for loop in training code,
    data can be sustained until entire for loop ends.
    So, in that case, you can explicitly delete variables after performing optimizer.step()

    for one_epoch in range(100):
        ...
        optimizer.step()
        del intermediate_variable1,intermediate_variable2,...
    

提交回复
热议问题