问题
When starting a TensorFlow server (tf.distribute.Server
), you must pass a ClusterSpec
which specifies all tasks/workers in the cluster.
Why? What does it use that information for?
Is this just that device names (for tf.device
) can uniquely identify on what worker they run in the cluster? E.g. for with tf.device("/job:ps/task:0")
or with tf.device("/job:worker/task:7")
?
来源:https://stackoverflow.com/questions/62004631/why-does-a-tf-server-needs-to-know-about-all-other-tasks-workers-in-the-cluster