how to normalize input data for models in tensorflow

后端 未结 2 633
小蘑菇
小蘑菇 2020-12-14 19:15

My training data are saved in 3 files, each file is too large and cannot fit into memory.For each training example, the data are two dimensionality (2805 rows and 222 column

2条回答
  •  温柔的废话
    2020-12-14 19:58

    Exapnding on benjaminplanche's answer for "#4 Dataset normalization", there is actually a pretty easy way to accomplish this.

    Tensorflow's Keras provides a preprocessing normalization layer. Now as this is a layer, its intent is to be used within the model. However you don't have to (more on that later).

    The model usage is simple:

    input = tf.keras.Input(shape=dataset.element_spec.shape)
    norm = tf.keras.layers.preprocessing.Normalization()
    norm.adapt(dataset) # you can use dataset.take(N) if N samples is enough for it to figure out the mean & variance.
    layer1 = norm(input)
    ...
    

    The advantage of using it in the model is that the normalization mean & variance are saved as part of the model weights. So when you load the saved model, it'll use the same values it was trained with.

     

    As mentioned earlier, if you don't want to use keras models, you don't have to use the layer as part of one. If you'd rather use it in your dataset pipeline, you can do that too.

    norm = tf.keras.layers.experimental.preprocessing.Normalization()
    norm.adapt(dataset)
    dataset = dataset.map(lambda t: norm(t))
    

    The disadvantage is that you need to save and restore those weights manually now (norm.get_weights() and norm.set_weights()). Numpy has convenient save() and load() functions you can use here.

    np.save("norm_weights.npy", norm.get_weights())
    norm.set_weights(np.load("norm_weights.npy", allow_pickle=True))
    

提交回复
热议问题