Saliency maps of neural networks (using Keras)
I have a fully connected multilayer perceptron trained in Keras. I feed it an N-dimensional feature vector and it predicts one out of M classes for the input vector. The training and prediction is working well. Now I want to analyze what part of the input feature vector is actually responsible for a particular class. For example, lets say there are two classes A and B , and an input vector f . The vector f belongs to class A and the network predicts it correctly - the output of the network is A=1 B=0 . Because I have some domain knowledge, I know that the entire f is actually not responsible