I have a tensor A with shape [a,n] and I need to perform an op my_op
with another tensor B of shape [b,n] such that the result
The tf.map_fn() construct can be used with a function that runs ops on GPU. By default, TensorFlow will try to run as much of the function as possible on the GPU, and any GPU-incompatible ops will run on the CPU. In your program, the entire elementwise_op()
function is built from GPU-compatible ops, so there should be no additional copying between CPU and GPU at each iteration.
The cause of low GPU utilization is difficult to determine from a program fragment. For example, if A
and B
are relatively small, and you are feeding them from Python and the immediately fetching back the result, it is likely that the overhead of copying the initial data to and from the GPU would dominate. The best way to track this down is to use a GPU profiler, which you can get using tfprof or the NVIDIA Visual Profiler.