问题
I was watching this video: Inside TensorFlow: tf.distribute.Strategy and understand that the tf.distribute.Strategy
was designed in a way that it supports both in-graph replication and between-graph replication.
(I'm collecting an overview of all the terminology and concepts here. Maybe I am just confusing things.)
From all the code examples and existing strategy implementations (here or here), it looks like this is always using in-graph replication? (Or is it actually always using between-graph replication? It's not really clear to me.) If so, is there a strategy which also works for between-graph replication?
Maybe the terminology has also changed (now with TF 2), and we would not talk explicitly about graphs anymore (but tf.function
instead?). But then it would be the same question, just rephrased with different terminology. (Not sure how to rephrase it exactly or in what way it actually would be different. Between-graph replication means that a separate/independent tf.function
would be created on every worker / for every replica?)
In the TF distribute API (for example here), it also talks about cross-replica context vs replica context. I'm not quite sure if this is orthogonal to between-graph replication vs in-graph replication, or just a different terminology for the same thing.
来源:https://stackoverflow.com/questions/62043868/which-strategy-is-for-between-graph-replication