How to train new fast-rcnn imageset

I am using fast-rcnn and try to train the system for new class (label) I followed this: https://github.com/EdisonResearch/fast-rcnn/tree/master/help/train

Placed the images
Placed the annotations
Prepare the ImageSet with all the image name prefix
Prepared selective search output: train.mat

I failed while running the train_net.py with the following error:

./tools/train_net.py --gpu 0 --solver models/VGG_1024_pascal2007/solver.prototxt --imdb voc_2007_train_top_5000 

Called with args: Namespace(cfg_file=None, gpu_id=0, imdb_name='voc_2007_train_top_5000', max_iters=40000, pretrained_model=None, randomize=False, solver='models/VGG_1024_pascal2007/solver.prototxt') Using config: {'DEDUP_BOXES': 0.0625,  'EPS': 1e-14,  'EXP_DIR': 'default',  'PIXEL_MEANS': array([[[ 102.9801,  115.9465,  122.7717]]]),  'RNG_SEED': 3,  'ROOT_DIR': '/home/hagay/fast-rcnn',  'TEST': {'BBOX_REG': True,
          'MAX_SIZE': 1000,
          'NMS': 0.3,
          'SCALES': [600],
          'SVM': False},  'TRAIN': {'BATCH_SIZE': 128,
           'BBOX_REG': True,
           'BBOX_THRESH': 0.5,
           'BG_THRESH_HI': 0.5,
           'BG_THRESH_LO': 0.1,
           'FG_FRACTION': 0.25,
           'FG_THRESH': 0.5,
           'IMS_PER_BATCH': 2,
           'MAX_SIZE': 1000,
           'SCALES': [600],
           'SNAPSHOT_INFIX': '',
           'SNAPSHOT_ITERS': 10000,
           'USE_FLIPPED': True,
           'USE_PREFETCH': False}} Loaded dataset `voc_2007_train` for training Appending horizontally-flipped training examples... voc_2007_train gt roidb loaded from /home/hagay/fast-rcnn/data/cache/voc_2007_train_gt_roidb.pkl /usr/local/lib/python2.7/dist-packages/numpy/core/fromnumeric.py:2507: VisibleDeprecationWarning: `rank` is deprecated; use the `ndim` attribute or function instead. To find the rank of a matrix see `numpy.linalg.matrix_rank`.   VisibleDeprecationWarning) wrote ss roidb to /home/hagay/fast-rcnn/data/cache/voc_2007_train_selective_search_IJCV_top_5000_roidb.pkl Traceback (most recent call last):   File "./tools/train_net.py", line 80, in <module>
    roidb = get_training_roidb(imdb)   File "/home/hagay/fast-rcnn/tools/../lib/fast_rcnn/train.py", line 107, in get_training_roidb
    imdb.append_flipped_images()   File "/home/hagay/fast-rcnn/tools/../lib/datasets/imdb.py", line 104, in append_flipped_images
    assert (boxes[:, 2] >= boxes[:, 0]).all() AssertionError

My Questions is:

Why am I having this error?
Do I need to rescale the images to fix: 256x256 before training?
Do I need to prepare something in order to set the __background__ class?

it says that there exists boxes[:,2] < boxes[:, 0], boxes[:, 2] is the x-max of bounding box while boxes[:, 0] is x-min. So the problem is related to region proposal. I came across with this problem too. I found that it was causes by overflow. I remember that the dtype for boxes is np.uint8(need to check), if the image is too big, you get this error.
rescale is one solution, however this may influence the performance. You can change the dtype from uint8 to float instead.
As far as I know, there is no need for that.

I'm late to the party, but when I was editing the code this was my yakkity-hack of a solution

        for b in range(len(boxes)):
            if boxes[b][2] < boxes[b][0]:
                boxes[b][0] = 0
        assert (boxes[:, 2] >= boxes[:, 0]).all()

There are smarter ways to do this, as every single grad student seems to point out, but this works fine.

jkjung13

Check out the solution described in the following blog post, Part 4, Issue #4. The solution is to flip the x1 and x2 coordinate values.

https://huangying-zhan.github.io/2016/09/22/detection-faster-rcnn.html

Following is copied from link:

box [:, 0] > box[:, 2]

Solution: add the following code block in imdb.py

def append_flipped_images(self):
num_images = self.num_images
widths = self._get_widths()
for i in xrange(num_images):
    boxes = self.roidb[i]['boxes'].copy()
    oldx1 = boxes[:, 0].copy()
    oldx2 = boxes[:, 2].copy()
    boxes[:, 0] = widths[i] - oldx2
    boxes[:, 2] = widths[i] - oldx1
    for b in range(len(boxes)):
            if boxes[b][2] < boxes[b][0]:
                boxes[b][0]=0
    assert (boxes[:, 2] >= boxes[:, 0]).all()

来源：https://stackoverflow.com/questions/31005463/how-to-train-new-fast-rcnn-imageset

标签

python

neural-network

caffe