Tensorflow Object Detection inference slow on CPU

问题

System information

What is the top-level directory of the model you are using: object_detection/ssd_inception_v2
Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
TensorFlow installed from (source or binary): binary
TensorFlow version (use command below): 1.2.1
Bazel version (if compiling from source): no
CUDA/cuDNN version: cuda 8.0
GPU model and memory: Quadro M6000 24GB

After training an ssd_inception_v2 model on my custom dataset I wanted to use it for inference. Since the inference should later run on a device without GPU I swithced to CPU only for inference. I adapted the opject_detection_tutorial.ipynb to measure the time for inference and let the following code run on a series of images from a video.

with detection_graph.as_default():
  with tf.Session(graph=detection_graph) as sess:
    while success:
      #print(str(datetime.datetime.now().time()) + " " + str(count))
      #read image
      success,image = vidcap.read()
      #resize image
      image = cv2.resize(image , (711, 400))
      # crop image to fit 690 x 400
      image = image[ : , 11:691]
      # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
      image_np_expanded = np.expand_dims(image, axis=0)
      #print(image_np_expanded.shape)
      image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
      # Each box represents a part of the image where a particular object was detected.
      boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
      # Each score represent how level of confidence for each of the objects.
      # Score is shown on the result image, together with the class label.
      scores = detection_graph.get_tensor_by_name('detection_scores:0')
      classes = detection_graph.get_tensor_by_name('detection_classes:0')
      num_detections = detection_graph.get_tensor_by_name('num_detections:0')
      before = datetime.datetime.now()
      # Actual detection.
      (boxes, scores, classes, num_detections) = sess.run(
          [boxes, scores, classes, num_detections],
          feed_dict={image_tensor: image_np_expanded})
      print("This took : " + str(datetime.datetime.now() - before))  
      vis_util.visualize_boxes_and_labels_on_image_array(
          image,
          np.squeeze(boxes),
          np.squeeze(classes).astype(np.int32),
          np.squeeze(scores),
          category_index,
          use_normalized_coordinates=True,
          line_thickness=8)

      #cv2.imwrite("converted/frame%d.jpg" % count, image)     # save frame as JPEG file
      count += 1

With the following output:
This took : 0:00:04.289925
This took : 0:00:00.909071
This took : 0:00:00.917636
This took : 0:00:00.908391
This took : 0:00:00.896601
This took : 0:00:00.908698
This took : 0:00:00.890018
This took : 0:00:00.896373
.....

Of course 900ms per image is not fast enough for video processing. After reading a lot of threads I see two possible ways for improvement:

Graph Transform Tool: In order to get the frozen inference graph faster. (I am hesitating to try this, because as far as I understand I would have to build TF from sources and I am usually happy with the current installation)
Replace Feeding: It seems that feed_dict={image_tensor: image_np_expanded} is not a good way to provide the data to the TF Graph. QueueRunner objects could help here.

So my question is if the above two improvements have the potential to boost the inference to real-time use (10 - 20 fps), or am I on the wrong path here and should try something else? Any suggestions are welcome.

来源：https://stackoverflow.com/questions/46301822/tensorflow-object-detection-inference-slow-on-cpu

标签

performance

tensorflow

cpu

object-detection