I\'ve been going through a few tutorials on using neural networks for key points detection. I\'ve noticed that for the inputs (images) it\'s very common to divide by 255 (normal
I think the most common for image normalization for neural network in general is to remove the mean of the image and dividing by its standard deviation
X = (X - mean_dataset) / std_dataset
I think key points detection problems should not be too different.
It might be interesting to see the differences in performance. My guess is that removing mean and dividing by std ([-1,1]) will converge more quickly compared to a [0,1] normalization.
Because the bias in the model will be smaller and thus need less time to reach if they are initialised at 0.