First tf.session.run() performs dramatically different from later runs. Why?

让人想犯罪 __ 提交于 2019-12-20 12:39:10

问题


Here's an example to clarify what I mean:
First session.run():
First run of a TensorFlow session

Later session.run():
Later runs of a TensorFlow session

I understand TensorFlow is doing some initialization here, but I'd like to know where in the source this manifests. This occurs on CPU as well as GPU, but the effect is more prominent on GPU. For example, in the case of a explicit Conv2D operation, the first run has a much larger quantity of Conv2D operations in the GPU stream. In fact, if I change the input size of the Conv2D, it can go from tens to hundreds of stream Conv2D operations. In later runs, however, there are always only five Conv2D operations in the GPU stream (regardless of input size). When running on CPU, we retain the same operation list in the first run compared to later runs, but we do see the same time discrepancy.

What portion of TensorFlow source is responsible for this behavior? Where are GPU operations "split?"

Thanks for the help!


回答1:


The tf.nn.conv_2d() op takes much longer to run on the first tf.Session.run() invocation because—by default—TensorFlow uses cuDNN's autotune facility to choose how to run subsequent convolutions as fast as possible. You can see the autotune invocation here.

There is an undocumented environment variable that you can use to disable autotune. Set TF_CUDNN_USE_AUTOTUNE=0 when you start the process running TensorFlow (e.g. the python interpreter) to disable its use.



来源:https://stackoverflow.com/questions/45063489/first-tf-session-run-performs-dramatically-different-from-later-runs-why

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!