How should I use blobs in a Caffe Python layer, and when does their training take place?

问题

I am creating a network using Caffe, for which I need to define my own layer. I would like to use the Python layer for this.

My layer will contain some learned parameters. From this answer, I am told that I will need to create a blob vector for this.

Is there any specification that this blob will need to follow, such as constraints in dimensions, etc.? Irrespective of what my layer does, can I create a blob of one dimension, and use any element, one each, of the blob for any computation in the layer?
What does the diff of a blob mean? From what I understand, the diff of bottom is the gradient at the current layer, and top for the previous layer. However, what exactly is happening here?
When do these parameters get trained? Does this need to be done manually in the layer definition?

I have seen the examples in test_python_layer.py, but most of them do not have any parameters.

回答1:

You can add as many internal parameters as you wish, and these parameters (Blobs) may have whatever shape you want them to be.

To add Blobs (in your layer's class):

def setup(self, bottom, top):
  self.blobs.add_blob(2) # add two blobs
  self.blobs[0].reshape(3, 4)  # first blob is 2D
  self.blobs[0].data[...] = 0 # init 
  self.blobs[1].reshape(10)  # second blob is 1D with 10 elements
  self.blobs[1].data[...] = 1 # init to 1

What is the "meaning" of each parameter and how to organize them in self.blobs is entirely up to you.

How are trainable parameters being "trained"?
This is one of the cool things about caffe (and other DNN toolkits as well), you don't need to worry about it!
What do you need to do? All you need is to compute the gradient of the loss w.r.t the parameters and store it in self.blobs[i].diff. Once the gradients are updated, caffe's internals takes care of updating the parameters according to the gradients/learning rate/momentum/update policy etc.
So,
You must have a non-trivial backward method for your layer

backward(self, top, propagate_down, bottom):
  self.blobs[0].diff[...] = # diff of parameters
  self.blobs[1].diff[...] = # diff for all the blobs

You might want to test your implementation of the layer, once you complete it.
Have a look at this PR for a numerical test of the gradients.

来源：https://stackoverflow.com/questions/44418828/how-should-i-use-blobs-in-a-caffe-python-layer-and-when-does-their-training-tak

标签

python

neural-network

deep-learning

customization

caffe