Training Multi-GPU on Tensorflow: a simpler way?

问题

I have been using the training method proposed in the cifar10_multi_gpu_train example for (local) multi-gpu training, i.e., creating several towers and then average the gradient. However, I was wondering the following: What does happen if I just take the losses coming from the different GPUs, sum them up and then just apply gradient descent to that new loss.

Would that work? Probably this is a silly question, and there must be a limitation somewhere. So I would be happy if you could comment on this.

Thanks and best regards, G.

回答1:

It would not work with the sum. You would get a bigger loss and consequentially bigger and probably erroneous gradients. While averaging the gradients you get an average of the direction that the weights have to take in order to minimize the loss, but each single direction is the one computed for the exact loss value.

One thing that you can try is to run the towers independently and then average the weights from time to time, slower convergence rate but faster processing on each node.

来源：https://stackoverflow.com/questions/41029037/training-multi-gpu-on-tensorflow-a-simpler-way

标签

machine-learning

tensorflow

gpu

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!