问题
I'm currently working with keras and want to visualize the output of each layer. When having a visualisation of a layer of a neural networks output, like the example below, which is for MNIST handwriting number recognition.
- What information or insight does a researcher gain from these images
- How are these images interpreted
- If you would choose to see the output of a layer, what are your criteria for selection?
Any comment or suggestion is greatly appreciated. Thank you.
回答1:
Preface: A convolutional network is a collection of filters applied to sections of an image (strides, which are seen in the gif). They produce true/false labels for if a given sub-section of the image matches the filter.
What you're seeing in the images you provide is not the best representation of how these visualizations work in my opinion, as they visualise how the CNN percieves the whole image, through each neuron. This means all of them look very similar.
Here is a better representation of how the basic filters of a network might look like. Some of them will trigger on straight lines, others will trigger on horizontal lines. That is also what the image you linked shows, except it does so for the whole image, on a visually simple object, which makes it a bit more difficult to understand. When you get to more complex filters which build on top of these basic filters, you might be better off visualizing the entire image.
There is also a concept called transfer learning, where you take existing generalized models that are highly regarded, and try to apply these to your specific problem. These models often need to be tuned, which might mean removing some layers that are not needed (as each layer we keep means it's usually more time-consuming to train), and/or adding more layers.
A researcher will better be able to interpret how each layer in the network builds on the previous layers, and how they contribute to solving the problem at hand. This is often based on gut-feeling (Which can be simplified by good visualizations such as this deep visualization toolbox video)
As an example, let's say I'm using VGG16, which is the name of a general model trained on image-net. I want to change it to classify distinct categories of furniture, instead of the 1000 classes from of completely different things it was originally intended to classify. Because it is such a general model, it can recognize a lot of different things, from humans to animals, to cars, to furniture. But a lot of these things don't make sense for me to incur a performance-penalty for, since they don't really help me classifying my furniture.
Since a lot of the most important discoveries we make about these classes happen at different layers in the network, I can then move back up the convolutional layers, and remove everything which seem to be too complex for the task I'm doing. This might mean I remove some layers that seem to have specialized in categorising human features such as ears, mouths, eyes and faces.
As far as I know, people visualize as many layers as they find useful, and then usually make a judgement call based on instinct as to which layers to keep or throw away after that.
Images borrowed from:
Visualizing what ConvNets learn
An Intuitive Explanation of Convolutional Neural Networks
来源:https://stackoverflow.com/questions/42240489/in-what-ways-are-the-output-of-neural-network-layers-useful