问题
I'm trying to train a LeNet on my own data (37 by 37 grayscale images of 1024 categories).
I created the lmdb files, and changed the size of the ouput layer to 1024. When I ran the caffe train
with my solver file, the program got stuck after printing
...
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "score"
bottom: "label"
top: "loss"
}
I0713 17:11:13.334890 9595 layer_factory.hpp:77] Creating layer data
I0713 17:11:13.334939 9595 net.cpp:91] Creating Layer data
I0713 17:11:13.334950 9595 net.cpp:399] data -> data
I0713 17:11:13.334961 9595 net.cpp:399] data -> label
What could possibly be the problem?
I'm new with caffe, any help will be appreciated.
solver.prototxt
net: "lenet_auto_train.prototxt"
test_iter: 100
test_interval: 500
base_lr: 0.01
momentum: 0.9
weight_decay: 0.0005
lr_policy: "inv"
gamma: 0.0001
power: 0.75
display: 100
max_iter: 10000
snapshot: 5000
snapshot_prefix: "lenet"
lenet.prototxt
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
transform_param {
scale: 0.00392156862745
}
data_param {
source: "dir/dat/1024_37*37_gray_lmdb"
batch_size: 64
backend: LMDB
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
convolution_param {
num_output: 20
kernel_size: 5
weight_filler {
type: "xavier"
}
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
convolution_param {
num_output: 50
kernel_size: 5
weight_filler {
type: "xavier"
}
}
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "fc1"
type: "InnerProduct"
bottom: "pool2"
top: "fc1"
inner_product_param {
num_output: 500
weight_filler {
type: "xavier"
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "fc1"
top: "fc1"
}
layer {
name: "score"
type: "InnerProduct"
bottom: "fc1"
top: "score"
inner_product_param {
num_output: 1024
weight_filler {
type: "xavier"
}
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "score"
bottom: "label"
top: "loss"
}
回答1:
In my case this happened when the same LMDB is used for both train and test.
回答2:
It seems like caffe is trying to read the lmdb and then encounters a problem.
My guess is that your db name "dir/dat/1024_37*37_gray_lmdb"
is causing the problem: having "*"
character in file name is not a good practice.
Change the db name to something like "dir/dat/1024_37x37_gray_lmdb"
and try again (don't forget to change the prototxt as well)
回答3:
The problem is, I put the options test_iter: 100
and test_interval: 500
in the solver file, but I did not specify a test network or test data layer in the network file.
来源:https://stackoverflow.com/questions/38348801/caffe-hangs-after-printing-data-label