How to fix this strange error: “RuntimeError: CUDA error: out of memory”

后端未结

关注

 6  746

时光取名叫无心 2021-02-12 22:59

I ran a code about the deep learning network,first I trained the network,and it works well,but this error occurs when running to the validate network.

I have five epoch,

6条回答

独厮守ぢ (楼主)

2021-02-12 23:49
1.. When you only perform validation not training,
you don't need to calculate gradients for forward and backward phase.
In that situation, your code can be located under
```
with torch.no_grad():
    ...
    net=Net()
    pred_for_validation=net(input)
    ...
```
Above code doesn't use GPU memory

2.. If you use += operator in your code,
it can accumulate gradient continuously in your gradient graph.
In that case, you need to use float() like following site
https://pytorch.org/docs/stable/notes/faq.html#my-model-reports-cuda-runtime-error-2-out-of-memory

Even if docs guides with float(), in case of me, item() also worked like
```
entire_loss=0.0
for i in range(100):
    one_loss=loss_function(prediction,label)
    entire_loss+=one_loss.item()
```
3.. If you use for loop in training code,
data can be sustained until entire for loop ends.
So, in that case, you can explicitly delete variables after performing optimizer.step()
```
for one_epoch in range(100):
    ...
    optimizer.step()
    del intermediate_variable1,intermediate_variable2,...
```
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...