Keras inconsistent prediction time

后端 未结 2 606
盖世英雄少女心
盖世英雄少女心 2021-01-01 15:48

I tried to get an estimate of the prediction time of my keras model and realised something strange. Apart from being fairly fast normally, every once in a while the model ne

2条回答
  •  无人及你
    2021-01-01 16:38

    While I can't explain the inconsistencies in execution time, I can recommend that you try to convert your model to TensorFlow Lite to speed up predictions on single data records or small batches.

    I ran a benchmark on this model:

    model = tf.keras.models.Sequential([
        tf.keras.layers.Dense(384, activation='elu', input_shape=(256,)),
        tf.keras.layers.Dense(384, activation='elu'),
        tf.keras.layers.Dense(256, activation='elu'),
        tf.keras.layers.Dense(128, activation='elu'),
        tf.keras.layers.Dense(32, activation='tanh')
    ])
    

    The prediction times for single records were:

    1. model.predict(input): 18ms
    2. model(input): 1.3ms
    3. Model converted to TensorFlow Lite: 43us

    The time to convert the model was 2 seconds.

    The class below shows how to convert and use the model and provides a predict method like the Keras model. Note that it would need to be modified for use with models that don’t just have a single 1-D input and a single 1-D output.

    class LiteModel:
    
        @classmethod
        def from_file(cls, model_path):
            return LiteModel(tf.lite.Interpreter(model_path=model_path))
    
        @classmethod
        def from_keras_model(cls, kmodel):
            converter = tf.lite.TFLiteConverter.from_keras_model(kmodel)
            tflite_model = converter.convert()
            return LiteModel(tf.lite.Interpreter(model_content=tflite_model))
    
        def __init__(self, interpreter):
            self.interpreter = interpreter
            self.interpreter.allocate_tensors()
            input_det = self.interpreter.get_input_details()[0]
            output_det = self.interpreter.get_output_details()[0]
            self.input_index = input_det["index"]
            self.output_index = output_det["index"]
            self.input_shape = input_det["shape"]
            self.output_shape = output_det["shape"]
            self.input_dtype = input_det["dtype"]
            self.output_dtype = output_det["dtype"]
    
        def predict(self, inp):
            inp = inp.astype(self.input_dtype)
            count = inp.shape[0]
            out = np.zeros((count, self.output_shape[1]), dtype=self.output_dtype)
            for i in range(count):
                self.interpreter.set_tensor(self.input_index, inp[i:i+1])
                self.interpreter.invoke()
                out[i] = self.interpreter.get_tensor(self.output_index)[0]
            return out
    
        def predict_single(self, inp):
            """ Like predict(), but only for a single record. The input data can be a Python list. """
            inp = np.array([inp], dtype=self.input_dtype)
            self.interpreter.set_tensor(self.input_index, inp)
            self.interpreter.invoke()
            out = self.interpreter.get_tensor(self.output_index)
            return out[0]
    

    The complete benchmark code and a plot can be found here: https://medium.com/@micwurm/using-tensorflow-lite-to-speed-up-predictions-a3954886eb98

提交回复
热议问题