Quantize a Keras neural network model

问题

Recently, I've started creating neural networks with Tensorflow + Keras and I would like to try the quantization feature available in Tensorflow. So far, experimenting with examples from TF tutorials worked just fine and I have this basic working example (from https://www.tensorflow.org/tutorials/keras/basic_classification):

import tensorflow as tf
from tensorflow import keras

fashion_mnist = keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

# fashion mnist data labels (indexes related to their respective labelling in the data set)
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

# preprocess the train and test images
train_images = train_images / 255.0
test_images = test_images / 255.0

# settings variables
input_shape = (train_images.shape[1], train_images.shape[2])

# create the model layers
model = keras.Sequential([
keras.layers.Flatten(input_shape=input_shape),
keras.layers.Dense(128, activation=tf.nn.relu),
keras.layers.Dense(10, activation=tf.nn.softmax)
])

# compile the model with added settings
model.compile(optimizer=tf.train.AdamOptimizer(),
          loss='sparse_categorical_crossentropy',
          metrics=['accuracy'])

# train the model
epochs = 3
model.fit(train_images, train_labels, epochs=epochs)

# evaluate the accuracy of model on test data
test_loss, test_acc = model.evaluate(test_images, test_labels)
print('Test accuracy:', test_acc)

Now, I would like to employ quantization in the learning and classification process. The quantization documentation (https://www.tensorflow.org/performance/quantization) (the page is no longer available since cca September 15, 2018) suggests to use this piece of code:

loss = tf.losses.get_total_loss()
tf.contrib.quantize.create_training_graph(quant_delay=2000000)
optimizer = tf.train.GradientDescentOptimizer(0.00001)
optimizer.minimize(loss)

However, it does not contain any information about where this code should be utilized or how it should be connected to a TF code (not even mentioning a high level model created with Keras). I have no idea how this quantization part relates to the previously created neural network model. Just inserting it following the neural network code runs into the following error:

Traceback (most recent call last):
  File "so.py", line 41, in <module>
    loss = tf.losses.get_total_loss()
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/util.py", line 112, in get_total_loss
    return math_ops.add_n(losses, name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py", line 2119, in add_n
    raise ValueError("inputs must be a list of at least one Tensor with the "
ValueError: inputs must be a list of at least one Tensor with the same dtype and shape

Is it possible to quantize a Keras NN model in this way or am I missing something basic? A possible solution that crossed my mind could be using low level TF API instead of Keras (needing to do quite a bit of work to construct the model), or maybe trying to extract some of the lower level methods from the Keras models.

回答1:

As mentioned in other answers, TensorFlow Lite can help you with network quantization.

TensorFlow Lite provides several levels of support for quantization.

Tensorflow Lite post-training quantization quantizes weights and activations post training easily. Quantization-aware training allows for training of networks that can be quantized with minimal accuracy drop; this is only available for a subset of convolutional neural network architectures.

So first, you need to decide whether you need post-training quantization or quantization-aware training. For example, if you already saved the model as *.h5 files, you would probably want to follow @Mitiku's instruction and do the post-training quantization.

If you prefer to achieve higher performance by simulating the effect of quantization in training (using the method you quoted in the question), and your model is in the subset of CNN architecture supported by quantization-aware training, this example may help you in terms of interaction between Keras and TensorFlow. Basically, you only need to add this code between model definition and its fitting:

sess = tf.keras.backend.get_session()
tf.contrib.quantize.create_training_graph(sess.graph)
sess.run(tf.global_variables_initializer())

回答2:

As your network looks quite simple, you can maybe use Tensorflow lite.

回答3:

Tensorflow lite can be used to quantize keras model.

The following code was written for tensorflow 1.14. It might not work for earlier versions.

First, after training the model you should save your model to h5

model.fit(train_images, train_labels, epochs=epochs)

# evaluate the accuracy of model on test data
test_loss, test_acc = model.evaluate(test_images, test_labels)
print('Test accuracy:', test_acc)
model.save("model.h5")

To load keras model use tf.lite.TFLiteConverter.from_keras_model_file

# load the previously saved model
converter  = tf.lite.TFLiteConverter.from_keras_model_file("model.h5")
tflite_model = converter.convert()
# Save the model to file
with open("tflite_model.tflite", "wb") as output_file:
    output_file.write(tflite_model)

The saved model can be loaded to python script or to other platforms and languages. To use saved tflite model, tensorlfow.lite provides Interpreter. The following example from here shows how to load tflite model from local file using python scripts.

import numpy as np
import tensorflow as tf

# Load TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path="tflite_model.tflite")
interpreter.allocate_tensors()

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Test model on random input data.
input_shape = input_details[0]['shape']
input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)
interpreter.set_tensor(input_details[0]['index'], input_data)

interpreter.invoke()

# The function `get_tensor()` returns a copy of the tensor data.
# Use `tensor()` in order to get a pointer to the tensor.
output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data)

来源：https://stackoverflow.com/questions/52259343/quantize-a-keras-neural-network-model

标签

python

tensorflow

neural-network

keras

quantization