问题
I used a temporary tensor to store data in my customized gpu-based op. For debug purpose, I want to print the data of this tensor by traditional printf inside C++. How can I pull this gpu-based tensor to cpu and then print its contents. Thank you very much.
回答1:
If by temporary you mean allocate_temp
instead of allocate_output
, there is no way of fetching the data on the python side.
I usually return the tensor itself during debugging so that a simple sess.run
fetches the result. Otherwise, the only way to display the data is the traditional printf
inside C++. Given your tensor is an output of your custom operation a tf.Print
eases further debugging.
Example:
Tensor temp_tensor;
OP_REQUIRES_OK(ctx, ctx->allocate_temp(DT_FLOAT, some.shape(), &temp_tensor));
float* host_memory = new float[some.NumElements()];
cudaMemcpy(host_memory, temp_tensor.flat<Dtype>().data(), some.NumElements() * sizeof(float), cudaMemcpyDeviceToHost);
std::cout << host_memory[0] << std::endl;
std::cout << host_memory[1] << std::endl;
std::cout << host_memory[2] << std::endl;
delete[] host_memory;
来源:https://stackoverflow.com/questions/51353605/how-can-i-pull-push-data-between-gpu-and-cpu-in-tensorflow