Tensorflow object detection mask rcnn uses too much memory

孤街醉人 提交于 2021-02-05 06:28:05

问题


I am trying to run TF object detection with mask rcnn, but it keeps dying on a node with 500GB of memory.

I updated the models/research/object_detection/trainer.py ConfigProto to

session_config = tf.ConfigProto(allow_soft_placement=True,
                                intra_op_parallelism_threads=1,
                                inter_op_parallelism_threads=1,
                                device_count = {'CPU': 1},
                                log_device_placement=False)

I updated the mask_rcnn_inception_resnet_v2_atrous_coco.config to

train_config: {
  batch_queue_capacity: 500
  num_batch_queue_threads: 8
  prefetch_queue_capacity: 10

Updating the ConfigProto has had the best effect so far. I got it all the way to 30 steps before it died instead of 1. I'm reducing the values in the train_config by half for this run. I have also reduced the number of images and objects significantly.

Any other ideas?


回答1:


I had a similar issue. I managed to reduce memory consumption by another factor of 2.5x by setting the following values:

prefetch_size: 4
num_readers: 4
min_after_dequeue: 1

I am not sure which of them (maybe all?) are responsible for reducing the memory, (i did not test that) or how much their exact values influence the memory consumption, but you can easily try that out.




回答2:


Some of the options that previously worked to reduce memory usage have been deprecated. From object_detection/protos/input_reader.proto:

optional uint32 queue_capacity = 3 [default=2000, deprecated=true];
optional uint32 min_after_dequeue = 4 [default=1000, deprecated=true];
optional uint32 prefetch_size = 13 [default = 512, deprecated=true];
optional uint32 num_parallel_map_calls = 14 [default = 64, deprecated=true];

As of today, num_parallel_batches appears to be the larges memory hog.

The *_input_reader messages my config file now looks like this:

train_input_reader: {
  tf_record_input_reader {
    input_path: "<DATASET_DIR>/tfrecords/train*.tfrecord"
  }
  label_map_path: "<DATASET_DIR>/label_map.pbtxt"
  load_instance_masks: true
  mask_type: PNG_MASKS
  num_parallel_batches: 1
}

Mask RCNN training now uses ~50% less CPU memory than before (training on 775 x 522 images).




回答3:


500GB is a good amount of memory. I have had issues with running out of GPU memory, which is a separate constraint.

For TensorFlow v2, I have found the following useful:

1. Reduce batch_size to a small value

In the config file, set:

train_config: {
  batch_size: 4
  ...
}

batch_size can be as low as 1.

2. Reduce the dimensions of resized images

In the config file, set the resizer height and width to a value lower than the default of 1024x1024.

model {
  faster_rcnn {
    number_of_stages: 3
    num_classes: 1
    image_resizer {
      fixed_shape_resizer {
        height: 256
        width: 256
      }
    }

3. Don't train the Feature Detector

This only applies to Mask R-CNN, and is the most difficult change to implement. In the file research/object_detection/model_lib_v2.py, change the following code:

Current:

def eager_train_step(detection_model,
...
  trainable_variables = detection_model.trainable_variables
  gradients = tape.gradient(total_loss, trainable_variables)

  if clip_gradients_value:
    gradients, _ = tf.clip_by_global_norm(gradients, clip_gradients_value)
  optimizer.apply_gradients(zip(gradients, trainable_variables))

New:

def eager_train_step(detection_model,
...
  # Mask R-CNN variables to train -- not feature detector
  trainable_variables = detection_model.trainable_variables
  to_fine_tune = []
  prefixes_to_train = ['FirstStageBoxPredictor',
                       'mask_rcnn_keras_box_predictor',
                       'RPNConv'
                        ]
  for var in trainable_variables:
    if any([var.name.startswith(prefix) for prefix in prefixes_to_train]):
      to_fine_tune.append(var)
    
  gradients = tape.gradient(total_loss, to_fine_tune)

  if clip_gradients_value:
    gradients, _ = tf.clip_by_global_norm(gradients, clip_gradients_value)
  optimizer.apply_gradients(zip(gradients, to_fine_tune))

There are implications to each of these changes. However, they may allow for a "good enough" result using scarce resources.



来源:https://stackoverflow.com/questions/49080884/tensorflow-object-detection-mask-rcnn-uses-too-much-memory

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!