Looking at an example \'solver.prototxt\', posted on BVLC/caffe git, there is a training meta parameter
weight_decay: 0.04
What does this m
Weight decay is a regularization term that penalizes big weights. When the weight decay coefficient is big the penalty for big weights is also big, when it is small weights can freely grow.
Look at this answer (not specific to caffe) for a better explanation: Difference between neural net "weight decay" and "learning rate".