Should a data batch be moved to CPU and converted (from torch Tensor) to a numpy array when doing evaluation w.r.t. a metric during the training?

问题

I am going through Andrew Ng’s tutorial from the CS230 Stanford course, and in every epoch of the training, evaluation is performed by calculating the metrics.

But before calculating the metrics, they are sending the batches to CPU and converting them to numpy arrays (code here).

# extract data from torch Variable, move to cpu, convert to numpy arrays
output_batch = output_batch.data.cpu().numpy()
labels_batch = labels_batch.data.cpu().numpy()

# compute all metrics on this batch
summary_batch = {metric: metrics[metric](output_batch, labels_batch) for metric in metrics}

My question is: why do they do that? Why don’t they just calculate the metrics (which is done here) on GPU using torch methods (e.g. torch.sum as opposed to np.sum)?

I would think that GPU to CPU transfers would slow things down, so there should be a very good reason for doing them?

I am new to PyTorch so I might be missing something.

回答1:

Correct me if I'm wrong. Sending back the data to the CPU allows to reduce the GPU load even though memory is replaced when entering the following loop cycle. Futhermore, I believe converting to numpy has the advantage of freeing memory since you are detaching your data from the calculation graph. You end up manipulating labels_batch.cpu().numpy() a fixed array vs labels_batch a tensor attached to the entire network through linked backward_fn callbacks.

来源：https://stackoverflow.com/questions/65179954/should-a-data-batch-be-moved-to-cpu-and-converted-from-torch-tensor-to-a-numpy

标签

python

deep-learning

pytorch

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!