Determinism in tensorflow gradient updates?
So I have a very simple NN script written in Tensorflow, and I am having a hard time trying to trace down where some "randomness" is coming in from. I have recorded the Weights, Gradients, Logits of my network as I train, and for the first iteration, it is clear that everything starts off the same. I have a SEED value both for how data is read in, and a SEED value for initializing the weights of the net. Those I never change. My problem is that on say the second iteration of every re-run I do, I start to see the gradients diverge, (by a small amount, like say, 1e-6 or so). However over time,