Difference between Keras and TensorFlow Hub Version of MobileNetV2

问题

I am working on a transfer learning approach and got very different results when using the MobileNetV2 from keras.applications and the one available on TensorFlow Hub. This seems strange to me as both versions claim here and here to extract their weights from the same checkpoint mobilenet_v2_1.0_224. This is how the differences can be reproduced, you can find the Colab Notebook here:

!pip install tensorflow-gpu==2.1.0
import tensorflow as tf
import numpy as np
import tensorflow_hub as hub
from tensorflow.keras.applications.mobilenet_v2 import MobileNetV2

def create_model_keras():
  image_input = tf.keras.Input(shape=(224, 224, 3))
  out = MobileNetV2(input_shape=(224, 224, 3),
                  include_top=True)(image_input)
  model = tf.keras.models.Model(inputs=image_input, outputs=out)
  model.compile(optimizer='adam', loss=["categorical_crossentropy"])
  return model

def create_model_tf():
  image_input = tf.keras.Input(shape=(224, 224 ,3))
  out = hub.KerasLayer("https://tfhub.dev/google/tf2-preview/mobilenet_v2/classification/4",
                      input_shape=(224, 224, 3))(image_input)
  model = tf.keras.models.Model(inputs=image_input, outputs=out)
  model.compile(optimizer='adam', loss=["categorical_crossentropy"])
  return model

When I try to predict on a random batch, the results are not equal:

keras_model = create_model_keras()
tf_model = create_model_tf()
np.random.seed(42)
data = np.random.rand(32,224,224,3)
out_keras = keras_model.predict_on_batch(data)
out_tf = tf_model.predict_on_batch(data)
np.array_equal(out_keras, out_tf)

The output of the version from keras.applications sums up to 1 but the version from TensorFlow Hub does not. Also the shape of the two versions is different: TensorFlow Hub has 1001 labels, keras.applications has 1000.

np.sum(out_keras[0]), np.sum(out_tf[0])

prints (1.0000001, -14.166359)

What is the reason for these differences? Am I missing something?

Edit 18.02.2020

As Szymon Maszke pointed out, the TFHub version returns logits. That's why i added a Softmax layer to the create_model_tf as follows: out = tf.keras.layers.Softmax()(x)

arnoegw mentioned that the TfHub version requires an image normalized to [0,1], whereas the keras version requires normalization to [-1,1]. When I use the following preprocessing on a test image:

from tensorflow.keras.applications.mobilenet_v2 import preprocess_input
img = tf.keras.preprocessing.image.load_img("/content/panda.jpeg", target_size=(224,224))
img = tf.keras.preprocessing.image.img_to_array(img)
img = preprocess_input(img)

img = tf.io.read_file("/content/panda.jpeg")
img = tf.image.decode_jpeg(img)
img = tf.image.convert_image_dtype(img, tf.float32)
img = tf.image.resize(img, (224,224))

Both correctly predict the same label and the following condition is true: np.allclose(out_keras, out_tf[:,1:], rtol=0.8)

Edit 2 18.02.2020 Before I wrote that it is not possible to convert the formats into each other. This was caused by a bug.

回答1:

There are several documented differences:

Like Szymon said, the TF Hub version returns logits (before the softmax function that turns them into probabilities), which is a common practice, because the cross-entropy loss can be computed with greater numerical stability from the logits.
The TF Hub model assumes float32 inputs in the range of [0,1], which is what you get from tf.image.decode_jpeg(...) followed by tf.image.convert_image_dtype(..., tf.float32). The Keras code uses a model-specific range (likely [-1,+1]).
The TF Hub model reflects the original SLIM checkpoint more completely in returning all its 1001 output classes. As stated in the ImageNetLabels.txt linked from the documentation, the added class 0 is "background" (aka. "stuff"). That is what object detection uses to indicate image background as opposed to an object of any known class.

来源：https://stackoverflow.com/questions/60251715/difference-between-keras-and-tensorflow-hub-version-of-mobilenetv2

标签

keras

tensorflow-hub

mobilenet