adding data to decoder in autoencoder during learning

问题

I want to implement an autoencoder using Keras and this structure is a large network that some operations is done on the output of autoencoder and then we should consider two loss I attached an image that shows my proposed structure. the link is below.

autoencoder structure

w has the same size as the input image and in this autoencoder, I do not use max pooling so the output of each phase has the same size as the input image. I want to send w and latent space representation to decoder part and then after adding a noise to the decoder output try to extract w using third part of the network. so I need that my loss function considers the difference between the input image and latent space representation and also between w and w'. but I have several problems with implementation. I do not know how can I add w to the decoder output, due to using this line "merge_encoded_w=cv2.merge(encoded,w) " produce an error and does not work. I do not sure my loss function is true based on what I need or not?please help me with this code. I am a beginner and finding the solution is difficult for me. I asked this question before but no one help me with this. please guide me. my code is as below:

from keras.models import Sequential
from keras.layers import Input, Dense, Dropout, Activation,UpSampling2D,Conv2D, MaxPooling2D, GaussianNoise
from keras.models import Model
from keras.optimizers import SGD
from keras.datasets import mnist
from keras import regularizers
from keras import backend as K
import keras as k
import numpy as np
import matplotlib.pyplot as plt
import cv2
from time import time
from keras.callbacks import TensorBoard
# Embedding phase
##encoder

w=np.random.random((1, 28,28))
input_img = Input(shape=(28, 28, 1))  # adapt this if using `channels_first` image data format

x = Conv2D(8, (5, 5), activation='relu', padding='same')(input_img)
#x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(4, (3, 3), activation='relu', padding='same')(x)
#x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(2, (3, 3), activation='relu', padding='same')(x)
encoded = Conv2D(1, (3, 3), activation='relu', padding='same')(x)
merge_encoded_w=cv2.merge(encoded,w)
#
#decoder

x = Conv2D(2, (5, 5), activation='relu', padding='same')(merge_encoded_w)
#x = UpSampling2D((2, 2))(x)
x = Conv2D(4, (3, 3), activation='relu', padding='same')(x)
#x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation='relu',padding='same')(x)
#x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)

#Extraction phase
decodedWithNois=k.layers.GaussianNoise(0.5)(decoded)
x = Conv2D(8, (5, 5), activation='relu', padding='same')(decodedWithNois)
#x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(4, (3, 3), activation='relu', padding='same')(x)
#x = MaxPooling2D((2, 2), padding='same')(x)
final_image_watermark = Conv2D(2, (3, 3), activation='relu', padding='same')(x)


autoencoder = Model([input_img,w], [decoded,final_image_watermark(2)])
encoder=Model(input_img,encoded)
autoencoder.compile(optimizer='adadelta', loss=['mean_squared_error','mean_squared_error'],metrics=['accuracy'])
(x_train, _), (x_test, _) = mnist.load_data()
x_validation=x_train[1:10000,:,:]
x_train=x_train[10001:60000,:,:]
#
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_validation = x_validation.astype('float32') / 255.
x_train = np.reshape(x_train, (len(x_train), 28, 28, 1))  # adapt this if using `channels_first` image data format
x_test = np.reshape(x_test, (len(x_test), 28, 28, 1))  # adapt this if using `channels_first` image data format
x_validation = np.reshape(x_validation, (len(x_validation), 28, 28, 1))  # adapt this if using `channels_first` image data format
autoencoder.fit(x_train, x_train,
                epochs=5,
                batch_size=128,
                shuffle=True,
                validation_data=(x_validation, x_validation),
                callbacks=[TensorBoard(log_dir='/tmp/autoencoder')])

decoded_imgs = autoencoder.predict(x_test)
encoded_imgs=encoder.predict(x_test)

回答1:

For this kind of large architecture, I suggest you build from small pieces, then put the pieces together. First, encoder part. It receives an image of size (28,28,1) and returns the encoded image of shape (28,28,1).

from keras.layers import Input, Concatenate, GaussianNoise
from keras.layers import Conv2D
from keras.models import Model

def make_encoder():
    image = Input((28, 28, 1))
    x = Conv2D(8, (5, 5), activation='relu', padding='same')(image)
    x = Conv2D(4, (3, 3), activation='relu', padding='same')(x)
    x = Conv2D(2, (3, 3), activation='relu', padding='same')(x)
    encoded =  Conv2D(1, (3, 3), activation='relu', padding='same')(x)

    return Model(inputs=image, outputs=encoded)
encoder = make_encoder()
encoder.summary()

#_________________________________________________________________
#Layer (type)                 Output Shape              Param #   
#=================================================================
#input_1 (InputLayer)         (None, 28, 28, 1)         0         
#_________________________________________________________________
#conv2d_1 (Conv2D)            (None, 28, 28, 8)         208       
_________________________________________________________________
#conv2d_2 (Conv2D)            (None, 28, 28, 4)         292       
#_________________________________________________________________
#conv2d_3 (Conv2D)            (None, 28, 28, 2)         74        
#_________________________________________________________________
#conv2d_4 (Conv2D)            (None, 28, 28, 1)         19        
#=================================================================
#Total params: 593
#Trainable params: 593
#Non-trainable params: 0
#_________________________________________________________________

The shape transition matches the theory.
Next, the decoder part takes encoded merged with another array, shape (28, 28, 2) and finally recovers original image, shape (28, 28, 1).

def make_decoder():
    encoded_merged = Input((28, 28, 2))
    x = Conv2D(2, (5, 5), activation='relu', padding='same')(encoded_merged)
    x = Conv2D(4, (3, 3), activation='relu', padding='same')(x)
    x = Conv2D(8, (3, 3), activation='relu',padding='same')(x)
    decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x) 

    return Model(inputs=encoded_merged, outputs=decoded)
decoder = make_decoder()
decoder.summary()

#_________________________________________________________________
#Layer (type)                 Output Shape              Param #   
#=================================================================
#input_2 (InputLayer)         (None, 28, 28, 2)         0         
#_________________________________________________________________
#conv2d_5 (Conv2D)            (None, 28, 28, 2)         102       
#_________________________________________________________________
#conv2d_6 (Conv2D)            (None, 28, 28, 4)         76        
#_________________________________________________________________
#conv2d_7 (Conv2D)            (None, 28, 28, 8)         296       
#_________________________________________________________________
#conv2d_8 (Conv2D)            (None, 28, 28, 1)         73        
#=================================================================
#Total params: 547
#Trainable params: 547
#Non-trainable params: 0
#_________________________________________________________________

The model then tries to recover the W array as well. Input is the reconstructed image plus noise (shape is (28, 28, 1)) .

def make_w_predictor():
    decoded_noise = Input((28, 28, 1))
    x = Conv2D(8, (5, 5), activation='relu', padding='same')(decoded_noise)
    x = Conv2D(4, (3, 3), activation='relu', padding='same')(x)
    pred_w = Conv2D(1, (3, 3), activation='relu', padding='same')(x)  
    # reconsider activation (is W positive?)
    # should be filter=1 to match W
    return Model(inputs=decoded_noise, outputs=pred_w)

w_predictor = make_w_predictor()
w_predictor.summary()

#_________________________________________________________________
#Layer (type)                 Output Shape              Param #   
#=================================================================
#input_3 (InputLayer)         (None, 28, 28, 1)         0         
#_________________________________________________________________
#conv2d_9 (Conv2D)            (None, 28, 28, 8)         208       
#_________________________________________________________________
#conv2d_10 (Conv2D)           (None, 28, 28, 4)         292       
#_________________________________________________________________
#conv2d_11 (Conv2D)           (None, 28, 28, 1)         37        
#=================================================================
#Total params: 537
#Trainable params: 537
#Non-trainable params: 0
#_________________________________________________________________

With all pieces at hand, putting pieces together to build the entire model is not so hard. Notice that the models you built above can be used like layers.

def put_together(encoder, decoder, w_predictor):
    image = Input((28, 28, 1))
    w = Input((28, 28, 1))
    encoded = encoder(image)

    encoded_merged = Concatenate(axis=3)([encoded, w])
    decoded = decoder(encoded_merged)

    decoded_noise = GaussianNoise(0.5)(decoded)
    pred_w = w_predictor(decoded_noise)

    return Model(inputs=[image, w], outputs=[decoded, pred_w])

model = put_together(encoder, decoder, w_predictor)
model.summary()

#__________________________________________________________________________________________________
#Layer (type)                    Output Shape         Param #     Connected to                     
#==================================================================================================
#input_4 (InputLayer)            (None, 28, 28, 1)    0                                            
#__________________________________________________________________________________________________
#model_1 (Model)                 (None, 28, 28, 1)    593         input_4[0][0]                    
#__________________________________________________________________________________________________
#input_5 (InputLayer)            (None, 28, 28, 1)    0                                            
#__________________________________________________________________________________________________
#concatenate_1 (Concatenate)     (None, 28, 28, 2)    0           model_1[1][0]                    
#                                                                 input_5[0][0]                    
#__________________________________________________________________________________________________
#model_2 (Model)                 (None, 28, 28, 1)    547         concatenate_1[0][0]              
#__________________________________________________________________________________________________
#gaussian_noise_1 (GaussianNoise (None, 28, 28, 1)    0           model_2[1][0]                    
#__________________________________________________________________________________________________
#model_3 (Model)                 (None, 28, 28, 1)    537         gaussian_noise_1[0][0]           
#==================================================================================================
#Total params: 1,677
#Trainable params: 1,677
#Non-trainable params: 0
#__________________________________________________________________________________________________

Code below trains the model with dummy data. Of course, you can use your own so long as the shape matches.

import numpy as np

# dummy data
images = np.random.random((1000, 28, 28, 1))
w = np.random.lognormal(size=(1000, 28, 28, 1))

# is accuracy sensible metric for this model?
model.compile(optimizer='adadelta', loss='mse', metrics=['accuracy'])
model.fit([images, w], [images, w], batch_size=64, epochs=5)

EDITS BELOW

I have some questions about the code that you put here. in the make_w_ predictor, you said:" # reconsider activation (is W positive?) # should be filter=1 to match W" what does it mean? W is an array that contains 0 and 1. what does it mean " reconsider activation" should I change the code for this part?

relu activation returns positive numbers in [0, +inf), so it may not be a good choice if W takes different set of values. Typical choice would be the following.

W can be positive and negative numbers: "linear" activation.
W in [0, 1]: "sigmoid" activation.
W in [-1, 1]: "tanh" activation.
W is positive number: "relu" activation.

In the original code, you had:

w=np.random.random((1, 28, 28))

which takes values between 0 and 1. So I suggested to switch from "relu" to "sigmoid". But I did not change in my code sample because I was not sure if this was intended.

you said the filter should be 1 it means change (3,3) to (1,1)? I am so sorry for these questions. but I am a beginner and I can not find some of these that you say. can you please help me and explain me completely.

I refer to this line in the original question:

final_image_watermark = Conv2D(2, (3, 3), activation='relu', padding='same')(x)

If I understand correct, this defines W' in the attached image, which should predict W and its size is (28, 28, 1). Then the first argument to the Conv2D should be one. Otherwise the output shape becomes (28, 28, 2). I made this change in my code sample because otherwise it emits shape mismatch error:

pred_w = Conv2D(1, (3, 3), activation='relu', padding='same')(x)

I think (3, 3) part, kernel size in keras, is fine as is.

来源：https://stackoverflow.com/questions/52337636/adding-data-to-decoder-in-autoencoder-during-learning

标签

python

tensorflow

keras

keras-layer

tensor