问题
I've implemented a home-brewed ZFNet (prototxt) for my research. After 20k iterations with the definition, the test accuracy stays at ~0.001 (i.e., 1/1000), the test loss at ~6.9, and training loss at ~6.9, which seems that the net keeps playing guessing games among the 1k classes. I've thoroughly checked the whole definition and tried to change some of the hyper-parameters to start a new training, but of no avail, same results' shown on the screen....
Could anyone show me some light? Thanks in advance!
The hyper-parameters in the prototxt are derived from the paper [1]. All the inputs and outputs of the layers seems correct as Fig. 3 in the paper suggests.
The tweaks are:
crop-s of the input for both training and testing are set to225instead of224as discussed in #33;one-pixel zero paddings for
conv3,conv4, andconv5to make the sizes of the blobs consistent [1];filler types for all learnable layers changed from
constantin [1] togaussianwithstd: 0.01;weight_decay: changing from0.0005to0.00025as suggested by @sergeyk in PR #33;
[1] Zeiler, M. and Fergus, R. Visualizing and Understanding Convolutional Networks, ECCV 2014.
and for the poor part..., I pasted it here
回答1:
A few suggestions:
- Change initialization from
gausstoxavier. - Work with "PReLU" acitvations, instead of
"ReLU". once your net converges you can finetune to remove them. - Try reducing
base_lrby an order of magnitude (or even two orders).
来源:https://stackoverflow.com/questions/39663506/test-accuracy-cannot-improve-when-learning-zfnet-on-ilsvrc12