问题
I am trying to run TF object detection with mask rcnn, but it keeps dying on a node with 500GB of memory.
I updated the models/research/object_detection/trainer.py ConfigProto to
session_config = tf.ConfigProto(allow_soft_placement=True,
intra_op_parallelism_threads=1,
inter_op_parallelism_threads=1,
device_count = {'CPU': 1},
log_device_placement=False)
I updated the mask_rcnn_inception_resnet_v2_atrous_coco.config to
train_config: {
batch_queue_capacity: 500
num_batch_queue_threads: 8
prefetch_queue_capacity: 10
Updating the ConfigProto has had the best effect so far. I got it all the way to 30 steps before it died instead of 1. I'm reducing the values in the train_config by half for this run. I have also reduced the number of images and objects significantly.
Any other ideas?
回答1:
I had a similar issue. I managed to reduce memory consumption by another factor of 2.5x by setting the following values:
prefetch_size: 4
num_readers: 4
min_after_dequeue: 1
I am not sure which of them (maybe all?) are responsible for reducing the memory, (i did not test that) or how much their exact values influence the memory consumption, but you can easily try that out.
回答2:
Some of the options that previously worked to reduce memory usage have been deprecated. From object_detection/protos/input_reader.proto:
optional uint32 queue_capacity = 3 [default=2000, deprecated=true];
optional uint32 min_after_dequeue = 4 [default=1000, deprecated=true];
optional uint32 prefetch_size = 13 [default = 512, deprecated=true];
optional uint32 num_parallel_map_calls = 14 [default = 64, deprecated=true];
As of today, num_parallel_batches
appears to be the larges memory hog.
The *_input_reader
messages my config file now looks like this:
train_input_reader: {
tf_record_input_reader {
input_path: "<DATASET_DIR>/tfrecords/train*.tfrecord"
}
label_map_path: "<DATASET_DIR>/label_map.pbtxt"
load_instance_masks: true
mask_type: PNG_MASKS
num_parallel_batches: 1
}
Mask RCNN training now uses ~50% less CPU memory than before (training on 775 x 522 images).
回答3:
500GB is a good amount of memory. I have had issues with running out of GPU memory, which is a separate constraint.
For TensorFlow v2, I have found the following useful:
1. Reduce batch_size
to a small value
In the config file, set:
train_config: {
batch_size: 4
...
}
batch_size
can be as low as 1.
2. Reduce the dimensions of resized images
In the config file, set the resizer height
and width
to a value lower than the default of 1024x1024.
model {
faster_rcnn {
number_of_stages: 3
num_classes: 1
image_resizer {
fixed_shape_resizer {
height: 256
width: 256
}
}
3. Don't train the Feature Detector
This only applies to Mask R-CNN, and is the most difficult change to implement. In the file research/object_detection/model_lib_v2.py
, change the following code:
Current:
def eager_train_step(detection_model,
...
trainable_variables = detection_model.trainable_variables
gradients = tape.gradient(total_loss, trainable_variables)
if clip_gradients_value:
gradients, _ = tf.clip_by_global_norm(gradients, clip_gradients_value)
optimizer.apply_gradients(zip(gradients, trainable_variables))
New:
def eager_train_step(detection_model,
...
# Mask R-CNN variables to train -- not feature detector
trainable_variables = detection_model.trainable_variables
to_fine_tune = []
prefixes_to_train = ['FirstStageBoxPredictor',
'mask_rcnn_keras_box_predictor',
'RPNConv'
]
for var in trainable_variables:
if any([var.name.startswith(prefix) for prefix in prefixes_to_train]):
to_fine_tune.append(var)
gradients = tape.gradient(total_loss, to_fine_tune)
if clip_gradients_value:
gradients, _ = tf.clip_by_global_norm(gradients, clip_gradients_value)
optimizer.apply_gradients(zip(gradients, to_fine_tune))
There are implications to each of these changes. However, they may allow for a "good enough" result using scarce resources.
来源:https://stackoverflow.com/questions/49080884/tensorflow-object-detection-mask-rcnn-uses-too-much-memory