I\'ve read Distributed Tensorflow Doc, and it mentions that in asynchronous training,
each replica of the graph has an independent training loop that
In asynchronous training there is no synchronization of weights among the workers. The weights are stored on the parameter server. Each worker loads and changes the shared weights independently from each other. This way if one worker finished an iteration faster than the other workers, it proceeds with the next iteration without waiting. The workers only interact with the shared parameter server and don't interact with each other.
Overall it can (depending on the task) speedup the computation significantly. However the results are sometimes worse than the ones obtained with the slower synchronous updates.