I found a PyTorch implementation that decays the batch norm decay parameter from 0.1 in the first epoch to 0.001 in the final epoch. A
decay
0.1
0.001