Distributed tensorflow with multiple gpu

问题

It seems that tf.train.replica_device_setter doesn't allow specify gpu which work with.

What I want to do is like below:

 with tf.device(
   tf.train.replica_device_setter(
   worker_device='/job:worker:task:%d/gpu:%d' % (deviceindex, gpuindex)):
     <build-some-tf-graph>

回答1:

If your parameters are not sharded, you could do it with a simplified version of replica_device_setter like below:

def assign_to_device(worker=0, gpu=0, ps_device="/job:ps/task:0/cpu:0"):
    def _assign(op):
        node_def = op if isinstance(op, tf.NodeDef) else op.node_def
        if node_def.op == "Variable":
            return ps_device
        else:
            return "/job:worker/task:%d/gpu:%d" % (worker, gpu)
    return _assign

with tf.device(assign_to_device(1, 2)):
  # this op goes on worker 1 gpu 2
  my_op = tf.ones(())

回答2:

I didn't check previous versions, but in Tensorflow 1.4/1.5, you can specify devices in replica_device_setter(worker_device='job:worker/task:%d/gpu:%d' % (FLAGS.task_index, i), cluster=self.cluster).

See tensorflow/python/training/device_setter.py line 199-202:

if ps_ops is None: # TODO(sherrym): Variables in the LOCAL_VARIABLES collection should not be # placed in the parameter server. ps_ops = ["Variable", "VariableV2", "VarHandleOp"]

Thanks to the code provided by @Yaroslav Bulatov, but his protocol is different from replica_device_setter, and may fail in some cases.

来源：https://stackoverflow.com/questions/39991238/distributed-tensorflow-with-multiple-gpu

标签

tensorflow

distributed

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!