I have been trying to use the pre-trained inception_resnet_v2 model released by Google. I am using their model definition(https://github.com/tensorflow/models/blob/master/sl
The Inception networks expect the input image to have color channels scaled from [-1, 1]. As seen here.
You could either use the existing preprocessing, or in your example just scale the images yourself: im = 2*(im/255.0)-1.0 before feeding them to the network.
Without scaling the input [0-255] is much larger than the network expects and the biases all work to very strongly predict category 918 (comic books).