问题
In the Breaking Linear Classifiers on ImageNet blog post, the author presented a very simple example on how to modify an image to fool a classifier. The technique given is pretty simple: xad = x + 0.5w
where x is the 1d vector and w is the 1d weight. This is all good and clear. However, I am trying to implement this with the MNIST dataset and got stuck, with no idea how to turn this simple idea into actual results. I'd like to know how to use the known w
matrix to modify a given x
matrix (or simply a flattened 1d image vector).
My images matrix x
is of the shape (1032, 784) (each image is a flattened vector with 784 numbers), and my weight matrix w
has the shape (784, 10). So the question is how to implement the idea introduced in the above mentioned article? In particular, how to add a bit weight to all images? Something like this:
x + 0.5 * w
My code can be found on GitHub. Solution with numpy is preferred, but using TensorFlow would be fine as well. Thanks!
回答1:
Figured out how:
So, if we're trying to create adversarial images to be falsely classified as "6", we need to grab the weights for "6" only from the weight matrix:
w_six = w[:, 6]
Then we can simply do matrix addition:
images_fool = x + 1.5 * w_six
来源:https://stackoverflow.com/questions/40966589/how-to-use-image-and-weight-matrix-to-create-adversarial-images-in-tensorflow