问题
I am going through Andrew Ng’s tutorial from the CS230 Stanford course, and in every epoch of the training, evaluation is performed by calculating the metrics.
But before calculating the metrics, they are sending the batches to CPU and converting them to numpy arrays (code here).
# extract data from torch Variable, move to cpu, convert to numpy arrays
output_batch = output_batch.data.cpu().numpy()
labels_batch = labels_batch.data.cpu().numpy()
# compute all metrics on this batch
summary_batch = {metric: metrics[metric](output_batch, labels_batch) for metric in metrics}
My question is: why do they do that? Why don’t they just calculate the metrics (which is done here) on GPU using torch methods (e.g. torch.sum
as opposed to np.sum
)?
I would think that GPU to CPU transfers would slow things down, so there should be a very good reason for doing them?
I am new to PyTorch so I might be missing something.
回答1:
Correct me if I'm wrong. Sending back the data to the CPU allows to reduce the GPU load even though memory is replaced when entering the following loop cycle. Futhermore, I believe converting to numpy
has the advantage of freeing memory since you are detaching your data from the calculation graph. You end up manipulating labels_batch.cpu().numpy()
a fixed array vs labels_batch
a tensor attached to the entire network through linked backward_fn
callbacks.
来源:https://stackoverflow.com/questions/65179954/should-a-data-batch-be-moved-to-cpu-and-converted-from-torch-tensor-to-a-numpy