Keras - Autoencoder accuracy stuck on zero

限于喜欢 提交于 2021-02-18 03:21:41

问题


I'm trying to detect fraud using autoencoder and Keras. I've written the following code as a Notebook:

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from sklearn.preprocessing import StandardScaler
from keras.layers import Input, Dense
from keras.models import Model
import matplotlib.pyplot as plt

data = pd.read_csv('../input/creditcard.csv')
data['normAmount'] = StandardScaler().fit_transform(data['Amount'].values.reshape(-1, 1))
data = data.drop(['Time','Amount'],axis=1)

data = data[data.Class != 1]
X = data.loc[:, data.columns != 'Class']

encodingDim = 7
inputShape = X.shape[1]
inputData = Input(shape=(inputShape,))

X = X.as_matrix()

encoded = Dense(encodingDim, activation='relu')(inputData)
decoded = Dense(inputShape, activation='sigmoid')(encoded)
autoencoder = Model(inputData, decoded)
encoder = Model(inputData, encoded)
encodedInput = Input(shape=(encodingDim,))
decoderLayer = autoencoder.layers[-1]
decoder = Model(encodedInput, decoderLayer(encodedInput))

autoencoder.summary()

autoencoder.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
history = autoencoder.fit(X, X,
                epochs=10,
                batch_size=256,
                validation_split=0.33)

print(history.history.keys())
# summarize history for accuracy
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

I'm probably missing something, my accuracy is stuck on 0 and my test loss is lower than my train loss.

Any Insight would be appericiated


回答1:


Accuracy on an autoencoder has little meaning, especially on a fraud detection algorithm. What I mean by this is that accuracy is not well defined on regression tasks. For example is it accurate to say that 0.1 is the same as 0.11. For the keras algorithm it is not. If you want to see how well your algorithm replicates the data I would suggest looking at the MSE or at the data itself. Many autoencoder use MSE as their loss function.

The metric you should be monitoring is the training loss on good examples vs the validation loss on fraudulent examples. There you can easily see if you can fit your real examples more closely than the fraudulent ones and how well your algorithm performs in practice.

Another design choice I would not make is relu in an autoencoder. ReLU works well with deeper model because of its simplicity and effectiveness in combating vanishing/exploding gradients. However, both of this concerns are non-factors in autoencoder and the loss of data hurts in an autoencoder. I would suggest using tanh as your intermediate activation function.



来源:https://stackoverflow.com/questions/45545109/keras-autoencoder-accuracy-stuck-on-zero

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!