I am using the unet for an image segmentation problem, the network trains very well when I use the Dice loss, but it does not optimize for any order of magnitude of the lear