I\'m training a convolution neural network (using Tensorflow) with the method of the so called \'Knowledge Distillation (KD)\' that in few words consist on training a big mo