How to Create CaffeDB training data for siamese networks out of image directory

后端 未结 1 1183
北荒
北荒 2020-12-03 08:56

I need some help to create a CaffeDB for siamese CNN out of a plain directory with images and label-text-file. Best would be a python-way to do it.
The problem is not to

相关标签:
1条回答
  • 2020-12-03 09:49

    Why don't you simply make two datasets using good old convert_imagest?

    layer {
      name: "data_a"
      top: "data_a"
      top: "label_a"
      type: "Data"
      data_param { source: "/path/to/first/data_lmdb" }
      ...
    }
    layer {
      name: "data_b"
      top: "data_b"
      top: "label_b"
      type: "Data"
      data_param { source: "/path/to/second/data_lmdb" }
      ...
    }
    

    As for the loss, since every example has a class label you need to convert label_a and label_b into a same_not_same_label. I suggest you do this "on-the-fly" using a python layer. In the prototxt add the call to python layer:

    layer {
      name: "a_b_to_same_not_same_label"
      type: "Python"
      bottom: "label_a"
      bottom: "label_b"
      top: "same_not_same_label"
      python_param { 
        # the module name -- usually the filename -- that needs to be in $PYTHONPATH
        module: "siamese"
        # the layer name -- the class name in the module
        layer: "SiameseLabels"
      }
      propagate_down: false
    }
    

    Create siamese.py (make sure it is in your $PYTHONPATH). In siamese.py you should have the layer class:

    import sys, os
    sys.path.insert(0,os.environ['CAFFE_ROOT'] + '/python')
    import caffe
    class SiameseLabels(caffe.Layer):
      def setup(self, bottom, top):
        if len(bottom) != 2:
           raise Exception('must have exactly two inputs')
        if len(top) != 1:
           raise Exception('must have exactly one output')
      def reshape(self,bottom,top):
        top[0].reshape( *bottom[0].shape )
      def forward(self,bottom,top):
        top[0].data[...] = (bottom[0].data == bottom[1].data).astype('f4')
      def backward(self,top,propagate_down,bottom):
          # no back prop
          pass
    

    Make sure you shuffle the examples in the two sets in a different manner, so you get non-trivial pairs. Moreover, if you construct the first and second data sets with different number of examples, then you will see different pairs at each epoch ;)


    Make sure you construct the network to share the weights of the duplicated layers, see this tutorial for more information.

    0 讨论(0)
提交回复
热议问题