I\'m looking for a way that, given an input image and a neural network, it will output a labeled class for each pixel in the image (sky, grass, mountain, person, car etc).
It seems like the 32s model is making large strides and thus working at a coarse resolution. Can you try the 8s model that seems to perform less resolution reduction.
Looking at J Long, E Shelhamer, T Darrell Fully Convolutional Networks for Semantic Segmentation, CVPR 2015 (especially at figure 4) it seems like the 32s model is not designed for capturing fine details of the segmentation.