I am unsuccessfully trying to implement a simple loss layer in Python using Caffe. As reference, I found several layers implemented in Python, including here, here and here.
Starting with the EuclideanLossLayer
as provided by the Caffe documentation/examples, I was not able to get it working and startd debugging. Even using this simple TestLayer
:
def setup(self, bottom, top):
"""
Checks the correct number of bottom inputs.
:param bottom: bottom inputs
:type bottom: [numpy.ndarray]
:param top: top outputs
:type top: [numpy.ndarray]
"""
print 'setup'
def reshape(self, bottom, top):
"""
Make sure all involved blobs have the right dimension.
:param bottom: bottom inputs
:type bottom: caffe._caffe.RawBlobVec
:param top: top outputs
:type top: caffe._caffe.RawBlobVec
"""
print 'reshape'
top[0].reshape(bottom[0].data.shape[0], bottom[0].data.shape[1], bottom[0].data.shape[2], bottom[0].data.shape[3])
def forward(self, bottom, top):
"""
Forward propagation.
:param bottom: bottom inputs
:type bottom: caffe._caffe.RawBlobVec
:param top: top outputs
:type top: caffe._caffe.RawBlobVec
"""
print 'forward'
top[0].data[...] = bottom[0].data
def backward(self, top, propagate_down, bottom):
"""
Backward pass.
:param bottom: bottom inputs
:type bottom: caffe._caffe.RawBlobVec
:param propagate_down:
:type propagate_down:
:param top: top outputs
:type top: caffe._caffe.RawBlobVec
"""
print 'backward'
bottom[0].diff[...] = top[0].diff[...]
I am not able to get the Python layer working. The learning task is rather simple, as I am merely trying to predict whether a real-valued number is positive or negative. The corresponding data is generated as follows and written to LMDBs:
N = 10000
N_train = int(0.8*N)
images = []
labels = []
for n in range(N):
image = (numpy.random.rand(1, 1, 1)*2 - 1).astype(numpy.float)
label = int(numpy.sign(image))
images.append(image)
labels.append(label)
Writing the data to LMDB should be correct as tests with the MNIST dataset provided by Caffe show no problems. The network is defined as follows:
net.data, net.labels = caffe.layers.Data(batch_size = batch_size, backend = caffe.params.Data.LMDB,
source = lmdb_path, ntop = 2)
net.fc1 = caffe.layers.Python(net.data, python_param = dict(module = 'tools.layers', layer = 'TestLayer'))
net.score = caffe.layers.TanH(net.fc1)
net.loss = caffe.layers.EuclideanLoss(net.score, net.labels)
Solving is done manually using:
for iteration in range(iterations):
solver.step(step)
The corresponding prototxt files are below:
solver.prototxt
:
weight_decay: 0.0005
test_net: "tests/test.prototxt"
snapshot_prefix: "tests/snapshot_"
max_iter: 1000
stepsize: 1000
base_lr: 0.01
snapshot: 0
gamma: 0.01
solver_mode: CPU
train_net: "tests/train.prototxt"
test_iter: 0
test_initialization: false
lr_policy: "step"
momentum: 0.9
display: 100
test_interval: 100000
train.prototxt
:
layer {
name: "data"
type: "Data"
top: "data"
top: "labels"
data_param {
source: "tests/train_lmdb"
batch_size: 64
backend: LMDB
}
}
layer {
name: "fc1"
type: "Python"
bottom: "data"
top: "fc1"
python_param {
module: "tools.layers"
layer: "TestLayer"
}
}
layer {
name: "score"
type: "TanH"
bottom: "fc1"
top: "score"
}
layer {
name: "loss"
type: "EuclideanLoss"
bottom: "score"
bottom: "labels"
top: "loss"
}
test.prototxt
:
layer {
name: "data"
type: "Data"
top: "data"
top: "labels"
data_param {
source: "tests/test_lmdb"
batch_size: 64
backend: LMDB
}
}
layer {
name: "fc1"
type: "Python"
bottom: "data"
top: "fc1"
python_param {
module: "tools.layers"
layer: "TestLayer"
}
}
layer {
name: "score"
type: "TanH"
bottom: "fc1"
top: "score"
}
layer {
name: "loss"
type: "EuclideanLoss"
bottom: "score"
bottom: "labels"
top: "loss"
}
I tried tracking it down, adding debug messages in the backward
and foward
methods of TestLayer
, only the forward
methods gets called during solving (note that NO testing is performed, the calls can only be related ot solving). Similarly I added debug messages in python_layer.hpp
:
virtual void Forward_cpu(const vector<Blob<Dtype>*>& bottom,
const vector<Blob<Dtype>*>& top) {
LOG(INFO) << "cpp forward";
self_.attr("forward")(bottom, top);
}
virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,
const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) {
LOG(INFO) << "cpp backward";
self_.attr("backward")(top, propagate_down, bottom);
}
Again, only the forward pass is executed. When I remove the backward
method in TestLayer
, solving still works. When removing the forward
method, an error is thrown as forward
is not implemented. I would expect the same for backward
, so it seems that the backward pass does not get executed at all. Switching back to regular layers and adding debug messages, everything works as expected.
I have the feeling that I am missing something simple or fundamental, but I was not able to resolve the problem for several days now. So any helps or hints are appreciated.
Thanks!
In addition to Erik B.'s answer, you can force caffe to backprob by specifying
force_backward: true
In your net prototxt.
See comments in caffe.proto
for more information.
This is the intended behaviour since you do not have any layers "below" your python layer that actually need the gradients to compute the weight updates. Caffe notices this and skips the backward computation for such layers because it would be a waste of time.
Caffe prints for all layers if the backward computation is needed in the log at the network initialization time. In your case, you should see something like:
fc1 does not need backward computation.
If you put an "InnerProduct" or "Convolution" layer below your "Python" layer (eg. Data->InnerProduct->Python->Loss
) the backward computation becomes necessary and your backward method gets called.
Mine wasn't working even though I did set force_backward: true
as suggested by David Stutz. I found out here and here that I was forgetting to set the diff of the last layer to 1 at the index of the target class.
As Mohit Jain describes in his caffe-users answer, if you are doing ImageNet classification with the tabby cat, after doing the forward pass, you'll have to do something like:
net.blobs['prob'].diff[0][281] = 1 # 281 is tabby cat. diff shape: (1, 1000)
Notice that you'll have to change the 'prob'
accordingly to the name of your last layer, which is usually softmax and 'prob'
.
Here's an example based on mine:
deploy.prototxt (it's loosely based on VGG16 just to show the structure of the file, but I didn't test it):
name: "smaller_vgg"
input: "data"
force_backward: true
input_dim: 1
input_dim: 3
input_dim: 224
input_dim: 224
layer {
name: "conv1_1"
type: "Convolution"
bottom: "data"
top: "conv1_1"
convolution_param {
num_output: 64
pad: 1
kernel_size: 3
}
}
layer {
name: "relu1_1"
type: "ReLU"
bottom: "conv1_1"
top: "conv1_1"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1_1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "fc1"
type: "InnerProduct"
bottom: "pool1"
top: "fc1"
inner_product_param {
num_output: 4096
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "fc1"
top: "fc1"
}
layer {
name: "drop1"
type: "Dropout"
bottom: "fc1"
top: "fc1"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "fc2"
type: "InnerProduct"
bottom: "fc1"
top: "fc2"
inner_product_param {
num_output: 1000
}
}
layer {
name: "prob"
type: "Softmax"
bottom: "fc2"
top: "prob"
}
main.py:
import caffe
prototxt = 'deploy.prototxt'
model_file = 'smaller_vgg.caffemodel'
net = caffe.Net(model_file, prototxt, caffe.TRAIN) # not sure if TEST works as well
image = cv2.imread('tabbycat.jpg', cv2.IMREAD_UNCHANGED)
net.blobs['data'].data[...] = image[np.newaxis, np.newaxis, :]
net.blobs['prob'].diff[0, 298] = 1
net.forward()
backout = net.backward()
# access grad from backout['data'] or net.blobs['data'].diff
来源:https://stackoverflow.com/questions/40540106/backward-pass-in-caffe-python-layer-is-not-called-working