What is the right way to do model parallelism in tensorflow?

微笑、不失礼 提交于 2019-12-14 00:27:14

问题


I have multiple 4GB GPU nodes so I want them to run huge model in parallel. I hope just splitting layers into several pieces with appropriate device scopes just enables model parallelism but it turns out that it doesn't reduce memory footprint for master node(task 0). (10 nodes configuration - master: 20g, followers:2g, 1 node configuration - master: 6~7g)

Suspicious one is that gradients are not distributed because I didn't setup right device scope for them.

my model is available on github.(https://github.com/nakosung/tensorflow-wavenet/tree/model_parallel_2)

device placement log is here: https://gist.github.com/nakosung/a38d4610fff09992f7e5569f19eefa57


回答1:


So the good news is that you using colocate_gradients_with_ops, which means that you are ensuring that the gradients are being computed on the same device that the ops are placed. (https://github.com/nakosung/tensorflow-wavenet/blob/model_parallel_2/train.py#L242)

Reading the device placement log is a little difficult, so I would suggest using TensorBoard to try visualizing the graph. It has options to be able to visualize how nodes are being placed on devices.

Secondly, you can try to see how the sizes of your operations map down to devices -- it is possible that the largest layers (largest activations, or largest weights) may be disproportionately placed more on some nodes than others. You might try to use https://github.com/tensorflow/tensorflow/blob/6b1d4fd8090d44d20fdadabf06f1a9b178c3d80c/tensorflow/python/tools/graph_metrics.py to analyze your graph to get a better picture of where resources are required in your graph.

Longer term we'd like to try to solve some of these placement problems automatically, but so far model parallelism requires a bit of care to place things precisely.



来源:https://stackoverflow.com/questions/40216948/what-is-the-right-way-to-do-model-parallelism-in-tensorflow

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!