The original question was in regard to TensorFlow implementations specifically. However, the answers are for implementations in general. This general answer is also the
I found a paper that explains the disharmony between Dropout and Batch Norm(BN). The key idea is what they call the "variance shift". This is due to the fact that dropout has a different behavior between training and testing phases, which shifts the input statistics that BN learns. The main idea can be found in this figure which is taken from this paper.
A small demo for this effect can be found in this notebook.