I have implemented focal loss in Pytorch with using of this paper. And ran into a problem with loss - got nan as loss function value.
This is implementation of focal